How good is Google Drive’s image recognition engine?
Search everything. Search by keyword and filter by file type, owner and more. … We also use image recognition so that if you drag and drop photos from your Grand Canyon trip into Drive, you can later search for [grand canyon] and photos of its gorges should pop up. This technology is still in its early stages, and we expect it to get better over time.
This sparked my curiosity, so I evaluated Google Drive’s performance like I would with the image recognition frameworks I do my research on. First I uploaded an image dataset and with images containing known objects and then counted how many of the pictures Google Drive’s search would find, if I search for these objects.
As dataset I used the popular Caltech 101 dataset containing pictures of objects belonging to 101 different categories. There are about 40 to 800 images per category and roughly 4500 images in total. While being far from perfect, it is a well-known contender.
These are my first finding:
Google Drive only finds a fraction of the images, but the images it finds it categorizes correctly.
- In numbers: Precision is 83% (std=36%) and the recall is 8% (std=11%) (averaged over all categories)
- The best results it achieves for the two ‘comic’ categories ‘Snoopy’ and ‘Garfield’ and for iconic symbols like the dollar bill and the stop sign.
- As the The Caltech 101 dataset was created using Google’s image search the high precision is at least partly a result of a ‘simple’ duplicate detection with the Google index and not of a successful similarity search.
As all vision systems working in such an unconstrained environment they are far from being actually usable. One cannot rely on them, but once or twice they will surprise you by adding an image to the result list, that one hasn’t thought of.
- Google Drive uses Google Goggles image recognition technology, which details are not public. External experts assume, that they don’t differ much from other state of the art approaches.
- Link to the result table
* The numbers are not comparable 1-to-1 as both use a different evaluation approach. The Matlab script assigns to each image of the dataset its most likely class, while google drive tries to find a concept or object in the image.