OpenCV MatchTemplate in C# is too slow compared to Python

I've programmed a solution in Python which worked great, but required several libraries to install and a lot of burocratic setup to work. I've decided to build it with a GUI in C# on Visual Studio Community 2017 but in the first successful function the result was way slower than in Python. Which IMO it should actually be faster.

The code essentially is just doing a needle in a haystack image search, by getting all images from a folder and testing each needle (total 60 images) in a haystack, in python I return the string, but in C# I'm only printing.

My code in Python is the following:

def getImages(tela):     retorno = []     folder = 'Images'     img_rgb = cv2.imread(tela)     for filename in os.listdir(folder):         template = cv2.imread(os.path.join(folder,filename))         w, h = template.shape[:-1]         res = cv2.matchTemplate(img_rgb, template, cv2.TM_CCOEFF_NORMED)         threshold = .96         loc = np.where(res >= threshold)         if loc[0]>0:             retorno.append(filename[0]+filename[1].lower())             if len(retorno)> 1:                 return retorno

and in C#:

Debug.WriteLine(ofd.FileName); Image<Bgr, byte> source = new Image<Bgr, byte>(ofd.FileName); string filepath = Directory.GetCurrentDirectory().ToString()+"\\Images"; DirectoryInfo d = new DirectoryInfo(filepath); var files = d.GetFiles(); foreach (var fname in files){     Image<Bgr, byte> template = new Image<Bgr, byte>(fname.FullName);     Image<Gray, float> result = source.MatchTemplate(template, Emgu.CV.CvEnum.TemplateMatchingType.CcoeffNormed);     double[] minValues, maxValues;     Point[] minLocations, maxLocations;     result.MinMax(out minValues, out maxValues, out minLocations, out maxLocations);     if (maxValues[0] > 0.96) {         Debug.WriteLine(fname);     } }

I didn't measure the time elapsed between each one, but I can say the result in C# takes about 3 seconds and in Python about 100ms.

There is room for optimization, if anyone would like to suggest any improvements, they are welcome.

3 Answers

Answers 1

The issue is that in Python code you finish the iteration when at least one match is added to retorno:

if len(retorno)> 1:   return retorno

In C# sample you continue iteration until all files are looped through.

Answers 2

This (denfromufa's answer) indeed explains your issue but to piggy back and add a few suggestions/optimizations as well:

1.) Your GetFiles can be replaced with a Parallel file enumerator, that is also recursive with children directories. I have shamelessly written a few on GitHub.

2.) You can parellelize the foreach loop into a Parallel.ForEach(files, fname () => { Code(); }); Again, my FileSearchBenchmark Repository on GitHub has plenty of File code execution in Parallel to provide examples.

Answers 3

I've combined the solutions proposed by denfromufa and HouseCat in the source code below, and did some overall cleanup, so you can see how your code could be. You will also notice minor readability improvements, since I wrote the refactored code using C# 7.0 / .NET 4.7.

Real Algorithm Optimization

Although denfromula correctly pointed out that implementation issue, and HouseCat mentioned using more CPU resources, the true gain relies on reducing the number of operations executed during your image search algorithm.

TURBO STAGE 1 - Suppose the MinMax() function goes through all your image's pixels to collect all those statistics, but you are only interested in using maxValue[0]. An extreme fine tuning would be to write a specific function which stops iterating through all your image's pixels when maxValue[0] goes below your minimum threshold. Apparently, that's all you need in your function. Remember: never burn all your processors computing lost of unused image statistics.
TURBO STAGE 2 - It looks like you are trying to recognize whether any image of your set of images matches your input screenshot (tela). If there are not too many images to be matched, and if you are constantly checking your screen for new matches, it is highly recommended to pre-load all those image match objects, and reuse them among your function calls. Constant disk IO operations and instantiating bitmap classes (for every single screenshot) leads to strong performance hit.
TURBO STAGE 3 - Just in case you are taking several screenshots per second, then try to reuse the screenshot's buffer. Constantly reallocating the whole screenshot's buffer when its dimensions simply did not change also causes performance loss.
TURBO STAGE 4 - This is hard to get, and depends on how much you want to invest on this. Think about your image recognition system as a big pipeline. The bitmaps as containers of data flowing among your stages (image matching stage, OCR stage, mouse position painting stage, video recording stage, etc). The idea is to create a fixed number of containers and reuse them, avoiding their creation and their destruction. The amount of containers is like the "buffer size" for your pipeline system. When the several stages of your pipeline finished using these containers, they are returned to the start of your pipeline, to a kind of container pool.

This last optimization this is really hard to achieve using these external libraries, because in most cases their API require some internal bitmap instantiation, and the fine tuning would also cause extreme software coupling between your library and the external one. So you will have to dig into these nice libraries to understand how they actually work, and build your own custom Framework. I can say it's a nice learning experience.

Those libraries are really cool for many purposes; they provide a generic API for improved functionality re-usability. This also means they address much more stuff than you actually need in a single API call. When it comes to high performance algorithms, you should always re-think what is the essential functionality you need from those libraries to achieve your goal, and if they are your bottleneck, do it by yourself.

I can say that a good fine-tuned image recognition algorithm doesn't take more than a few milliseconds to do what you want. I've experienced image recognition applications which do it almost instantaneously for larger screenshots (e.g. Eggplant Functional).

Now back to your code...

Your refactored code should look like this - I did not include all those fine-tuned algorithms I've mentioned. You should better ask separate questions for those in SO.

        Image<Bgr, byte> source = new Image<Bgr, byte>(ofd.FileName);          // Preferably use Path.Combine here:         string dir = Path.Combine(Directory.GetCurrentDirectory(), "Images");          // Check whether directory exists:         if (!Directory.Exists(dir))             throw new Exception($"Directory was not found: '{dir}'");          // It looks like you just need filenames here...         // Simple parallel foreach suggested by HouseCat (in 2.):         Parallel.ForEach(Directory.GetFiles(dir), (fname) =>         {             Image<Gray, float> result = source.MatchTemplate(                 new Image<Bgr, byte>(fname.FullName),                 Emgu.CV.CvEnum.TemplateMatchingType.CcoeffNormed);              // By using C# 7.0, we can do inline out declarations here:             result.MinMax(                 out double[] minValues,                 out double[] maxValues,                 out Point[] minLocations,                 out Point[] maxLocations);              if (maxValues[0] > 0.96)             {                 // ...                 var result = ...                 return result; // <<< As suggested by: denfromufa             }              // ...         });

Happy Tuning ;-)

Coding Question

Tuesday, May 15, 2018

OpenCV MatchTemplate in C# is too slow compared to Python

3 Answers

Answers 1

Answers 2

Answers 3

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook