Tuesday, March 8, 2016

Why am I getting “tiff page 1 not found” Lebtonica warning in Tesseract?

Leave a Comment

I just started using Tesseract.

I am following the instructions described here.

I have created a test image like this:

training/text2image --text=test.txt --outputbase=eng.Arial.exp0 --font='Arial' --fonts_dir=/usr/share/fonts 

Now I want to train the Tesseract like follows:

tesseract eng.Arial.exp0.tif eng.Arial.exp0 box.train 

Here is the output that I have:

Tesseract Open Source OCR Engine v3.04.00 with Leptonica Page 1 APPLY_BOXES:    Boxes read from boxfile:     112    Found 112 good blobs. Generated training data for 21 words Warning in pixReadMemTiff: tiff page 1 not found 

This prevents the creation of fontfile.tr file. I have tried continuing by ignoring the warning, but when creating the char-sets I get an awefull content:

unicharset_extractor lang.fontname.exp0.box  "58 NULL 0 NULL 0 Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # Joined [4a 6f 69 6e 65 64 ] |Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # Broken T 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # T [54 ] h 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # h [68 ] e 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # e [65 ] ( 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # ( [28 ] q 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # q [71 ] u 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # u [75 ] ..." 

Here is the version I am using:

tesseract 3.04.00  leptonica-1.72   libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 

Any idea why this happens?

Thanks!

0 Answers

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment