I'm curious about how I may be able to more reliably recognise the value and the suit of playing card images. Here are two examples:
There may be some noise in the images, but I have a large dataset of images that I could use for training (roughly 10k pngs, including all values & suits).
I can reliably recognise images that I've manually classified, if I have a known exact-match using a hashing method. But since I'm hashing images based on their content, then the slightest noise changes the hash and results in an image being treated as unknown. This is what I'm looking to reliably address with further automation.
I've been reviewing the 3.05 documentation on training tesseract: https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#automated-method
Can tesseract only be trained with images found in fonts? Or could I use it to recognise the suits for these cards?
I was hoping that I could say that all images in this folder correspond to 4c (e.g. the example images above), and that tesseract would see the similarity in any future instances of that image (regardless of noise) and also read that as 4c. Is this possible? Does anyone here have experience with this?
1 Answers
Answers 1
This has been my non-tesseract solution to this, until someone proves there's a better way. I've setup:
- Caffe: http://caffe.berkeleyvision.org/install_osx.html
- Digits: https://github.com/NVIDIA/DIGITS/blob/master/docs/BuildDigits.md
Getting these to running was the hardest part. Next, I used my dataset to train a new caffe network. I prepared my dataset into a single depth folder structure:
./card ./card/2c ./card/2d ./card/2h ./card/2s ./card/3c ./card/3d ./card/3h ./card/3s ./card/4c ./card/4d ./card/4h ./card/4s ./card/5c ./card/5d ./card/5h ./card/5s ./card/6c ./card/6d ./card/6h ./card/6s ./card/7c ./card/7d ./card/7h ./card/7s ./card/8c ./card/8d ./card/8h ./card/8s ./card/9c ./card/9d ./card/9h ./card/9s ./card/_noise ./card/_table ./card/Ac ./card/Ad ./card/Ah ./card/As ./card/Jc ./card/Jd ./card/Jh ./card/Js ./card/Kc ./card/Kd ./card/Kh ./card/Ks ./card/Qc ./card/Qd ./card/Qh ./card/Qs ./card/Tc ./card/Td ./card/Th ./card/Ts
Within Digits, I chose:
- Datasets tab
- New Dataset Images
- Classification
- I pointed it to my card folder, e.g: /path/to/card
- I set the validation % to 13.0%, based on the discussion here: http://stackoverflow.com/a/13612921/880837
- After creating the dataset, I opened the models tab
- Chose my new dataset.
- Chose the GoogLeNet under Standard Networks, and left it to train.
I did this several times, each time I had new images in the dataset. Each learning session took 6-10 hours, but at this stage I can use my caffemodel to programmatically estimate what each image is expected to be, using this logic: https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp
The results are either a card (2c, 7h, etc), noise, or table. Any estimates with an accuracy bigger than 90% are most likely correct. The latest run correctly recognised 300 out of 400 images, with only 3 mistakes. I'm adding new images to the dataset and retraining the existing model, further tuning the result accuracy. Hope this is valuable to others!
While I wanted the high level steps here, this was all done with large thanks to David Humphrey and his github post, I really recommend reading it and trying it out if you're interested in learning more: https://github.com/humphd/have-fun-with-machine-learning
0 comments:
Post a Comment