Saturday, April 9, 2016

Building a simple image search using TensorFlow

Leave a Comment

I need to implement a simple image search in my app using TensorFlow. The requirements are these:

  1. The dataset contains around a million images, all of the same size, each containing one unique object and only that object.
  2. The search parameter is an image taken with a phone camera of some object that is potentially in the dataset.

I've managed to extract the image from the camera picture and straighten it to rectangular form and as a result, a reverse-search image indexer like TinEye was able to find a match.

Now I want to reproduce that indexer by using TensorFlow to create a model based on my data-set (make each image's file name a unique index).

Could anyone point me to tutorials/code that would explain how to achieve such thing without diving too much into computer vision terminology?

Much appreciated!

1 Answers

Answers 1

The Wikipedia article on TinEye says that Perceptual Hashing will yield results similar to TinEye's. They reference this detailed description of the algorithm. But TinEye refuses to comment.


The biggest issue with the Perceptual Hashing approach is that while it's efficient for identifying the same image (subject to skews, contrast changes, etc.), it's not great at identifying a completely different image of the same object (e.g. the front of a car vs. the side of a car).

TensorFlow has great support for deep neural nets which might give you better results. Here's a high level description of how you might use a deep neural net in TensorFlow to solve this problem:

Start with a pre-trained NN (such as GoogLeNet) or train one yourself on a dataset like ImageNet. Now we're given a new picture we're trying to identify. Feed that into the NN. Look at the activations of a fairly deep layer in the NN. This vector of activations is like a 'fingerprint' for the image. Find the picture in your database with the closest fingerprint. If it's sufficiently close, it's probably the same object.

The intuition behind this approach is that unlike Perceptual Hashing, the NN is building up a high-level representation of the image including identifying edges, shapes, and important colors. For example, the fingerprint of an apple might include information about its circular shape, red color, and even its small stem.


You could also try something like this 2012 paper on image retrieval which uses a slew of hand-picked features such as SIFT, regional color moments and object contour fragments. This is probably a lot more work and it's not what TensorFlow is best at.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment