Friday, October 7, 2016

tesseract didn't get the little labels

By Hường Hana 3:30 AM ocr, tesseract Leave a Comment

I've installed tesseract on my linux environment.

It works when I execute something like

# tesseract myPic.jpg /output

But my pic has some little labels and tesseract didn't see them.

Is an option is available to set a pitch or something like that ?

Example of text labels:

With this pic, tesseract doesn't recognize any value...

But with this pic:

I have the following output:

J8  J7A-J7B P7 \  2 40 50 0 180 190  200  P1 P2 7  110 110 \ l

For example, in this case, the 90 (on top left) is not seen by tesseract...

I think it's just an option to define or somethink like that, no ?

Thx

1 Answers

Answers 1

In order to get accurate results from Tesseract (as well as any OCR engine) you will need to follow some guidelines as can be seen in my answer on this post: Junk results when using Tesseract OCR and tess-two

Here is the gist of it:

Use a high resolution image (if needed) 300 DPI is minimum

Make sure there is no shadows or bends in the image

If there is any skew, you will need to fix the image in code prior to ocr

Use a dictionary to help get good results

Adjust the text size (12 pt font is ideal)

Binarize the image and use image processing algorithms to remove noise

It is also recommended to spend some time training the OCR engine to receive better results as seen in this link: Training Tesseract

I took the 2 images that you shared and ran some image processing on them using the LEADTOOLS SDK (disclaimer: I am an employee of this company) and was able to get better results than you were getting with the processed images, but since the original images aren't the greatest - it still was not 100%. Here is the code I used to try and fix the images:

//initialize the codecs class using (RasterCodecs codecs = new RasterCodecs()) {    //load the file    using (RasterImage img = codecs.Load(filename))    {       //Run the image processing sequence starting by resizing the image       double newWidth = (img.Width / (double)img.XResolution) * 300;       double newHeight = (img.Height / (double)img.YResolution) * 300;       SizeCommand sizeCommand = new SizeCommand((int)newWidth, (int)newHeight, RasterSizeFlags.Resample);       sizeCommand.Run(img);        //binarize the image       AutoBinarizeCommand autoBinarize = new AutoBinarizeCommand();       autoBinarize.Run(img);        //change it to 1BPP       ColorResolutionCommand colorResolution = new ColorResolutionCommand();       colorResolution.BitsPerPixel = 1;       colorResolution.Run(img);        //save the image as PNG       codecs.Save(img, outputFile, RasterImageFormat.Png, 0);    } }

Here are the output images from this process:

Coding Question

Friday, October 7, 2016

tesseract didn't get the little labels

1 Answers

Answers 1

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook