My plan for a content based image search

I saw this job posting from EyeEm, a photo sharing app / service, in which they express their wish/plan to build a search engine that can ‘identify and understand beautiful photographs’. That got me thinking about how I would approach building a system like that.

Here is how I would start:

1. Define what you are looking for

eyeem.brandenburgertorEyeEm already has a search engine based on tags and geo-location. So I assume, they want to prevent low quality pictures to appear in the results and add missing tags to pictures, based on the image’s content. One could also group similar looking pictures or rank those pictures lower which “don’t contain their tags”.  For instance for the Brandenburger Tor there are a lot of similar looking pictures and even some that don’t contain the gate at all.

But for which concepts should one train the algo-rithms? Modern image retrieval systems are trained for hundreds of concepts, but I don’t think it is wise to start with that many. Even the most sophisticated, fine tuned systems have high error rates for most of the concepts as can be seen in this year’s results of the Large Scale Visual Recognition Challenge.

For instance the team from EUVision / University of Amsterdam, placed 6 in the classification challenge, only selected 16 categories for their consumer app Impala. For a consumer application I think their tags are a good choice:

  • Architecture
  • Babies
  • Beaches
  • Cars
  • Cats (sorry, no dogs)
  • Children
  • Food
  • Friends
  • Indoor
  • Men
  • Mountains
  • Outdoor
  • Party life
  • Sunsets and sunrises
  • Text
  • Women

But of course EyeEm has the luxury of looking at their log files to find out what their users are actually searching for.

And on a comparable task of classifying pictures into 15 scene categories a team from MIT under Antonio Torralba showed that even with established algorithms one can achieve nearly 90% accuracy [Xiao10]. So I think it’s a good idea to start with a limited number of standard and EyeEm specific concepts, which allows for usable recognition accuracy even with less sophisticated approaches.

But what about identifying beautiful photographs? I think in image retrieval there is no other concept which is more desirable and challenging to master. What does beautiful actually mean? What features make a picture beautiful? How do you quantify these features? Is beautiful even a sensibly concept for image retrieval? Might it be more useful trying to predict which pictures will be `liked` or `hearted` a lot? These questions have to be answered before one can even start experimenting. I think for now it is wise to start with just filtering out low quality pictures and to try to predict what factors make a picture popular.

2. Gather datasets

Not only do the systems need to be trained with example photographs for which we know the depicted concepts, we also need data to evaluate our system to be sure that the implemented system really works as intended. But to gather useful datasets for learning and benchmarking is one of the hardest and most overlooked tasks. To draw meaningful conclusions the dataset must consist of huge quantities of realistic example pictures with high and consistent metadata. In our case here, I would aggregate existing datasets that contain labeled images for the categories we want to learn.

For starters the ImageNet, the Scene Understanding and the Faces in the Wild databases seem usable. Additionally one could manually add pictures from Flickr, google image search and EyeEm’s users.

Apart from a rather limited dataset of paintings and pictures of nature from the Computational Aesthetics Group of the University Jena, Germany, I don’t know any good dataset to evaluate how well a system detects beautiful images. Researchers either harvest photo communities that offer peer-rated ‘beautifulness’ scores such as photo.net [Datta06] or dpchallenge.com [Poga12], or they collect photos themselves and rate the pictures themselves for visual appeal [Poga12, Tang13].

The problem with datasets harvested from photo communities is that they suffer from self selection bias, because users only upload their best shots. As a result there are few low quality shots to train the system.

Never the less I would advise to collect the data inhouse. If labeling an image as low quality takes one second, one person can label 30.000 images in less then 10h. And even if we accept that one picture has to be labeled by multiple persons to minimize unwanted subjectivity, this approach would ensure, that the system has the same notion of beauty as favored by EyeEm.

3. Algorithms to try

I would start with established techniques like the Bag of visual Words approach (BoW). As the before mentioned MIT paper describes, over 80% accuracy can already be achieved with this method for a comparable task of classifying 15 indoor and outdoor scenes [Xiao10]. While this approach originally relies on the patented SIFT feature detector and descriptor, one can choose from a whole list of new free alternatives, which deliver comparable performance while being much faster and having a lower memory footprint [Miksik2012].  In the MIT paper they also combined BoW with other established methods to increase the accuracy to nearly 90%.

The next step than would be to use Alex Krizhevsk’s implementation of a deep convolutional neural network which he used to win last year’s Large Scale Visual Recognition Challenge. The code is freely available online. While being much more powerful this system is also much harder to train, with many parameters to train with out good existing heuristics.

But these two approaches wont really help assessing the beauty of pictures or identifying the low quality ones. If one agrees with Microsoft Research’s view of photo quality, defined by simplicity, (lack of) realism and quality of craftsmanship, one could start with the algorithms they designed to classify between high quality professional photos and low quality snapshots. [Ke06]

Caveats

Specific for the case at hand I predict that the filters will cause problems. They change the colors and some of them add high and low frequency elements. This will decrease the accuracy of the algorithm. To prevent this the analysis has to be performed on the phone or the unaltered image has to be uploaded as well.

Low quality or not?

If I remember correctly I once read that EyeEm applies the filters in full resolution to pictures on their servers and downloads the result to the user’s phones afterwards. If this is still the case both approaches are feasible. But as phones get more and more powerful a system which works on the phone is to be preferred as it is inherently more scalable.

Another challenge would be to distinguish between low quality pictures and pictures that break the rules of photography on purpose. The picture on the right for example has a blue undertone, low contrast and is quite blurry. But while these features make this image special, they would also trigger the low quality detector. It will be interesting to see if machine learning algorithms can learn to distinguish between the two cases.

So to recap:

1. Make sure the use case is sound.
2. Collect loads of data to train and evaluate.
3. Start with simple, proven algorithms and increase the complexity step by step.

Advertisements

How to archive my tweets?

[UPDATE]

So, it looks like IFTTT deleted my recipe, to avoid drawing Twitters wrath on themselves. I will mail the customer support and ask, what happened…

[Original article]

tldr: IFTTT recipe: Archive my tweets to dropbox!

Even though I am a happy Twitter user, it makes me feel very uncomfortable that I cannot easily access all my old tweets. For that reason I used IFTTT and Dropbox for the last several months to archive all the new tweets, so that I have access to those at least. But after Twitter forbid IFTTT to directly access the tweets, this solution was no longer working. But as clever users found out, Twitter is still providing a RSS feed containing all publicly sent tweets  for each user.

As a result it was very easy to get the whole pipe line working again. If you want to try it youself just use my public IFTTT recipe: Archive my tweets to dropbox! You only have to replace my Twitter handle with yours…

Btw: Please share if you have other useful recipes!

Book: Programming Computer Vision with Python

Book cover: Programming Computer Vision with Python

In case anyone missed it, you can download a very mature draft of “Programming Computer Vision with Python” at programmingcomputervision.com. This book takes a fresh approach at introducing people to the computer vision field. It is aimed at beginners, who have some programming experience (not necessary Python) and basic understanding of linear algebra (matrices and vectors) and analysis.

The covered topics are (as taken from the TOC):

  • Basic Image Handling and Processing
  • Local Image Descriptors
  • Image to Image Mappings
  • Camera Models and Augmented Reality
  • Clustering Images
  • Searching Images
  • Classifying Image Content
  • Image Segmentation
  • OpenCV

What I like the most are the mini project like programming your own little augmented reality app or building a complete web app for content based image search. It is always great to have little working demos to show to your friends. I will definitely recommend it to anyone new and interested in the computer vision field.

The author, Jan Erik Solem, is Associate Professor at Lunds Universitet and co-founder of PolarRose, a facial recognition software company which was bought by Apple in September 2010.

You can buy the book from July on at Amazon.com or download, as said, the draft from the book’s website.

Follow-up on Mendeley’s Binary Battle

Last week Mendeley announced the winners of their Binary Battle.  The first place goes to openSNP

With openSNP, you can share your personal genome from 23andMe or deCODEme to find the latest relevant research and let scientists discover new genetic associations. Werner Vogels, Amazon CTO and one of our star judges said this, “OpenSNP is cool. I have uploaded my genotype, and it is interesting to see it at work.”

Runner up is PaperCritic.

PaperCritic allows for post-publication peer review in an open environment. Rate papers, write critical reviews or read those from others.

The third place goes to rOpenSci,

which provides R-based tools to facilitate Open Science; including R packages for both Mendeley and PLoS.

I am more than content that my entry, Collabgraph, made it into the top 10. And I am thankful for everybody who voted for me the last month. Taking part in the Binary Battle was a valuable experience. Thank you again, Paddy, for hosting Collabgraph!

 

Computer Vision News

I created Computer Vision News (CVN), an aggregator for all the events and academic vacancies within the field of Computer Vision, ImageAnalysis, and Medical Image Analysis. You can also follow it on Twitter @compvisionnews!

At the moment I use following sources:

Please write me, if you have sources I should add. I am happy to extend CVN.
I prefer just to have the headlines in my Twitter timeline, where they don’t clutter my mail client or my feedreeder. But use it as you like! Yeah! for more choices!

Who is collaborating?

Collaboration graph of master thesis created with collabgraph

In my scarce spare time, I have written Collabgraph to visualize connections between authors of scientific publications.

This python script reads a (your) bibtex file and draws a graph in which all the nodes are authors and the edges represent that the two authors have collaborated (or at least wrote a paper together).

On the right is the graph created by from the references used in my diploma thesis.  You can immediately see what central role Eakins, Meier and Flickner played.
Collabgraph requires only the pygraphviz library, which can installed with “easy_install pygraphviz”.

You can find the sourcode and the example at bitbucket.org.

I am looking forward to your feedback!!!

Jobs in computer vision

As I am currently looking for a new job for the time after my internship, I came across several websites dedicated to jobs in the field of computer vision and image retrieval. And being the nice guy I am, I want to share what I found.

The links are sorted by descending frequency of new offers:

Furthermore some conferences, like CVPR, have a jobs section:

A lot of positions are also offered via the Imageworld mailing list:

So, and now good luck! 🙂