How good is Google Drive’s image recognition engine?

As announced via twitter I took the time to test Google Drive’s image recognition feature. Google Drive was announced two weeks ago with a blog post, which contained the bold claim:

Search everything. Search by keyword and filter by file type, owner and more. … We also use image recognition so that if you drag and drop photos from your Grand Canyon trip into Drive, you can later search for [grand canyon] and photos of its gorges should pop up. This technology is still in its early stages, and we expect it to get better over time.

This sparked my curiosity, so I evaluated Google Drive’s performance like I would with the image recognition frameworks I do my research on. First I uploaded an image dataset and with images containing known objects and then counted how many of the pictures Google Drive’s search would find, if I search for these objects.

As dataset I used the popular  Caltech 101 dataset containing pictures of objects belonging to 101 different categories. There are about 40 to 800 images per category and roughly 4500 images in total. While being far from perfect, it is a well-known contender.

These are my first finding:

  • Google Drive only finds a fraction of the images, but the images it finds it categorizes correctly.

  • In numbers: Precision is 83% (std=36%) and the recall is 8% (std=11%) (averaged over all categories)
  • The best results it achieves for the two ‘comic’ categories ‘Snoopy’ and ‘Garfield’ and for iconic symbols like the dollar bill and the stop sign.
  • As the The Caltech 101 dataset was created using Google’s image search the high precision is at least partly a result of a ‘simple’ duplicate detection with the Google index and not of a successful similarity search.

Verdict:

As all vision systems working in such an unconstrained environment they are far from being actually usable. One cannot rely on them, but once or twice they will surprise you by adding an image to the result list, that one hasn’t thought of.

Further resources:

[update]

Link to Matlab code which achieves 65% precision with 100% recall.*

* The numbers are not comparable 1-to-1 as both use a different evaluation approach. The Matlab script assigns to each image of the dataset its most likely class, while google drive tries to find a concept or object in the image.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s