How good is Google Drive’s image recognition engine?

As announced via twitter I took the time to test Google Drive’s image recognition feature. Google Drive was announced two weeks ago with a blog post, which contained the bold claim:

Search everything. Search by keyword and filter by file type, owner and more. … We also use image recognition so that if you drag and drop photos from your Grand Canyon trip into Drive, you can later search for [grand canyon] and photos of its gorges should pop up. This technology is still in its early stages, and we expect it to get better over time.

This sparked my curiosity, so I evaluated Google Drive’s performance like I would with the image recognition frameworks I do my research on. First I uploaded an image dataset and with images containing known objects and then counted how many of the pictures Google Drive’s search would find, if I search for these objects.

As dataset I used the popular  Caltech 101 dataset containing pictures of objects belonging to 101 different categories. There are about 40 to 800 images per category and roughly 4500 images in total. While being far from perfect, it is a well-known contender.

These are my first finding:

  • Google Drive only finds a fraction of the images, but the images it finds it categorizes correctly.

  • In numbers: Precision is 83% (std=36%) and the recall is 8% (std=11%) (averaged over all categories)
  • The best results it achieves for the two ‘comic’ categories ‘Snoopy’ and ‘Garfield’ and for iconic symbols like the dollar bill and the stop sign.
  • As the The Caltech 101 dataset was created using Google’s image search the high precision is at least partly a result of a ‘simple’ duplicate detection with the Google index and not of a successful similarity search.

Verdict:

As all vision systems working in such an unconstrained environment they are far from being actually usable. One cannot rely on them, but once or twice they will surprise you by adding an image to the result list, that one hasn’t thought of.

Further resources:

[update]

Link to Matlab code which achieves 65% precision with 100% recall.*

* The numbers are not comparable 1-to-1 as both use a different evaluation approach. The Matlab script assigns to each image of the dataset its most likely class, while google drive tries to find a concept or object in the image.

PhotoSketch

In case you missed it, PhotoSketch is the application which is leaving everyone excited.
PhotoSketch, developed by Tao Chen* , Ming-Ming Chen* , Ping Tan+, Ariel Shamir° and Shi-Min Hu’, is a system

which composes a realistic picture from a simple freehand sketch annotated with text labels. The composed picture is generated by seamlessly stitching several photographs in agreement with the sketch and text labels.

I must say, I am very impressed. Both selection of the images and combining them is done on a level I haven’t seen before. The application wont render Photoshop useless, as they don’t consider lighting and perspective, but  I personally see the application as a showcase of what their algorithms are capable of, Nevertheless we only have a video and no demo but, they presented their results at Siggraph so it is not totally fake. In any case, watch the video to form your own opinion.

Example PhotoSketch

Example Results from PhotoSketch

On their website you can find their paper and more material.

Here the complete abstract:

We present a system that composes a realistic picture from a simple freehand sketch annotated with text labels. The composed picture is generated by seamlessly stitching several photographs in agreement with the sketch and text labels; these are found by searching the Internet. Although online image search generates many inappropriate results, our system is able to automatically select suitable photographs to generate a high quality composition, using a filtering scheme to exclude undesirable images. We also provide a novel image blending algorithm to allow seamless image composition. Each blending result is given a numeric score, allowing us to find an optimal combination of discovered images. Experimental results show the method is very successful; we also evaluate our system using the results from two user studies.

Workflow PhotoSketch

Pipeline of Photosketch

* Tsinghua University

+ National University of Singapore

° The Interdisciplinary Center

‘ Tsinghua University

Detexify LaTeX handwritten symbol recognition

Anyone who works with LaTeX knows how time-consuming it can be to find a symbol in symbols-a4.pdf that you just can’t memorize. Detexify is an attempt to simplify this search.

Detexify

Detexify

Does as promised. It is a nice example how shape recognition can help in your daily life. At least for those who’s daily life consists of writing formulae in Latex.