My plan for a content based image search

I saw this job posting from EyeEm, a photo sharing app / service, in which they express their wish/plan to build a search engine that can ‘identify and understand beautiful photographs’. That got me thinking about how I would approach building a system like that.

Here is how I would start:

1. Define what you are looking for

eyeem.brandenburgertorEyeEm already has a search engine based on tags and geo-location. So I assume, they want to prevent low quality pictures to appear in the results and add missing tags to pictures, based on the image’s content. One could also group similar looking pictures or rank those pictures lower which “don’t contain their tags”.  For instance for the Brandenburger Tor there are a lot of similar looking pictures and even some that don’t contain the gate at all.

But for which concepts should one train the algo-rithms? Modern image retrieval systems are trained for hundreds of concepts, but I don’t think it is wise to start with that many. Even the most sophisticated, fine tuned systems have high error rates for most of the concepts as can be seen in this year’s results of the Large Scale Visual Recognition Challenge.

For instance the team from EUVision / University of Amsterdam, placed 6 in the classification challenge, only selected 16 categories for their consumer app Impala. For a consumer application I think their tags are a good choice:

  • Architecture
  • Babies
  • Beaches
  • Cars
  • Cats (sorry, no dogs)
  • Children
  • Food
  • Friends
  • Indoor
  • Men
  • Mountains
  • Outdoor
  • Party life
  • Sunsets and sunrises
  • Text
  • Women

But of course EyeEm has the luxury of looking at their log files to find out what their users are actually searching for.

And on a comparable task of classifying pictures into 15 scene categories a team from MIT under Antonio Torralba showed that even with established algorithms one can achieve nearly 90% accuracy [Xiao10]. So I think it’s a good idea to start with a limited number of standard and EyeEm specific concepts, which allows for usable recognition accuracy even with less sophisticated approaches.

But what about identifying beautiful photographs? I think in image retrieval there is no other concept which is more desirable and challenging to master. What does beautiful actually mean? What features make a picture beautiful? How do you quantify these features? Is beautiful even a sensibly concept for image retrieval? Might it be more useful trying to predict which pictures will be `liked` or `hearted` a lot? These questions have to be answered before one can even start experimenting. I think for now it is wise to start with just filtering out low quality pictures and to try to predict what factors make a picture popular.

2. Gather datasets

Not only do the systems need to be trained with example photographs for which we know the depicted concepts, we also need data to evaluate our system to be sure that the implemented system really works as intended. But to gather useful datasets for learning and benchmarking is one of the hardest and most overlooked tasks. To draw meaningful conclusions the dataset must consist of huge quantities of realistic example pictures with high and consistent metadata. In our case here, I would aggregate existing datasets that contain labeled images for the categories we want to learn.

For starters the ImageNet, the Scene Understanding and the Faces in the Wild databases seem usable. Additionally one could manually add pictures from Flickr, google image search and EyeEm’s users.

Apart from a rather limited dataset of paintings and pictures of nature from the Computational Aesthetics Group of the University Jena, Germany, I don’t know any good dataset to evaluate how well a system detects beautiful images. Researchers either harvest photo communities that offer peer-rated ‘beautifulness’ scores such as photo.net [Datta06] or dpchallenge.com [Poga12], or they collect photos themselves and rate the pictures themselves for visual appeal [Poga12, Tang13].

The problem with datasets harvested from photo communities is that they suffer from self selection bias, because users only upload their best shots. As a result there are few low quality shots to train the system.

Never the less I would advise to collect the data inhouse. If labeling an image as low quality takes one second, one person can label 30.000 images in less then 10h. And even if we accept that one picture has to be labeled by multiple persons to minimize unwanted subjectivity, this approach would ensure, that the system has the same notion of beauty as favored by EyeEm.

3. Algorithms to try

I would start with established techniques like the Bag of visual Words approach (BoW). As the before mentioned MIT paper describes, over 80% accuracy can already be achieved with this method for a comparable task of classifying 15 indoor and outdoor scenes [Xiao10]. While this approach originally relies on the patented SIFT feature detector and descriptor, one can choose from a whole list of new free alternatives, which deliver comparable performance while being much faster and having a lower memory footprint [Miksik2012].  In the MIT paper they also combined BoW with other established methods to increase the accuracy to nearly 90%.

The next step than would be to use Alex Krizhevsk’s implementation of a deep convolutional neural network which he used to win last year’s Large Scale Visual Recognition Challenge. The code is freely available online. While being much more powerful this system is also much harder to train, with many parameters to train with out good existing heuristics.

But these two approaches wont really help assessing the beauty of pictures or identifying the low quality ones. If one agrees with Microsoft Research’s view of photo quality, defined by simplicity, (lack of) realism and quality of craftsmanship, one could start with the algorithms they designed to classify between high quality professional photos and low quality snapshots. [Ke06]

Caveats

Specific for the case at hand I predict that the filters will cause problems. They change the colors and some of them add high and low frequency elements. This will decrease the accuracy of the algorithm. To prevent this the analysis has to be performed on the phone or the unaltered image has to be uploaded as well.

Low quality or not?

If I remember correctly I once read that EyeEm applies the filters in full resolution to pictures on their servers and downloads the result to the user’s phones afterwards. If this is still the case both approaches are feasible. But as phones get more and more powerful a system which works on the phone is to be preferred as it is inherently more scalable.

Another challenge would be to distinguish between low quality pictures and pictures that break the rules of photography on purpose. The picture on the right for example has a blue undertone, low contrast and is quite blurry. But while these features make this image special, they would also trigger the low quality detector. It will be interesting to see if machine learning algorithms can learn to distinguish between the two cases.

So to recap:

1. Make sure the use case is sound.
2. Collect loads of data to train and evaluate.
3. Start with simple, proven algorithms and increase the complexity step by step.

AI, the new secret weapon in the cloud photo-storage war.

Gigaom posted an article on “The Dropbox computer vision acquisition that slipped under the radar“. But I think it the article should have been called:

AI, the new secret weapon in the cloud photo-storage war.

Okay, this title is probably a hyperbole. But all the big internet companies offer a way to store and share your photos online. And to make their offer more compelling Yahoo, Google, and Dropbox all recently bought computer vision start-ups that will provide image recognition for their user’s uploaded photos. While Yahoo bought LookFlow, Google bought DNNresearch.

Microsoft is researching on image recognition for a long time and I am sure they will soon integrate some of their algorithms into their cloud products. And Facebook just founded an internal AI group.

And to get a look into the future without having to upload all your photographs to the internet, try the iOS app Impala. The app will analyse and categorise all your photographs on your device. It was created by EUVision technologies, a spin off of the University of Amsterdam commercializing their research efforts. 

After the negative conclusion from my last post about the closure of Everpix these are positive news for the machine learning market.

The end of Everpix, a sad week for photographers and machine learning researchers.

This week the photo storage service Everpix announced, that they will close down. They did not have enough paying costumers and could not find new investors.

That is sad. Not only because it was the world’s best photo startup according to the Verge, but also because it was the only company besides Google that used new machine learning techniques to help people manage their photo mess.

everpix home screen

Everpix home screen

Their closure can be seen as an indicator that end users and investors are not ready yet to spend additional money on machine learning algorithms.

Flashback mail

Flashback mail

Having read some articles and the associated comments[1, 2], it is clear to me that not their use of sophisticated machine learning algorithms but the daily ‘flashback’ email with pictures taken on the same day in previous years was the more popular feature. In fact, I did not even see one single comment about the algorithms that analysed the pictures.

But maybe their algorithms were just not good enough.

Unfortunately I could not try out their algorithms myself. My pictures just finished processing a few days before they announced to close down. But I found a comment of one of the founders on Hacker News, saying that they used a deep convolutional neural network with 3 layers for the semantic image analysis. This is the same technology Google now uses for their photo search.

But they were unhappy with the results of the algorithm so in January this year they changed their approach as their CTO, Kevin Quennesson, explains in ‘To Reclaim Your Photos, Kill the Algorithm’.  He writes: “If a user is a food enthusiast and takes a lot of food close-ups, are we going to tell him that this photo is not the photo of a dish because an algorithm only learned to model some other kind of dishes?” They found that the algorithm’s errors were not comprehensible for the end user.

So they planned to change their system. As I understand it, their old system learned and used concepts independent of the single user. But the new system also uses pictures of the same user to infer the content of a new picture. He calls this “feature based image backlinks”.

Explanation Feature-based Image Backlinks

The graph shows how a picture of a dish can be correctly identified because the content can be inferred by similar pictures of the user that the system identified correctly before. – from Quennesson’s blog post

Regardless of the success of Everpix, I think using the context of an image more is a helpful and necessary approach to build systems, that will reliably predict the content of an image in the future.

In any case I wish we would hear more about the underlying algorithms, what they tried, what worked and what not.

International Computer Vision Summer School (ICVSS) 2012, a review

This blog post is intended to be informative for students who plan to attend ICVSS for the first time and to give feedback to the organizers, because I had not time to fill out my feedback-form back there 😉

It is based on my experiences of participating in this years ICVSS 2012. In this post I will only talk about the organizational side. Of course the lectures are the most important part of the summer school, but it is easier to establish if the lectures are relevant to you than to find out if you would enjoy the summer school.

TLDR: I liked it very much. If you are a 2nd or 3rd year PhD student and have 1300€ to spend, definitely go!

What is ICVSS?

The International Computer Vision Summer School is a yearly one-week conference for students in the field of computer vision held in Sicily, Italy. In contrast to normal conferences it is less formal and the aim is to learn rather than to present. For that reason they invited renowned researchers and professors to talk about he “theoretical and practical aspects of real Computer Vision problems as well as examples of their successful commercialisation”.

The summer school’s program comprises of

  • lectures,
  • workshops (which are a like practical lectures),
  • poster presentations by the attendees,
  • a reading group,
  • a essay contest (Brady Price),
  • a written exam and
  • social events.

It is organized by the University of Catania, Sicily, Italy and the University of Cambridge, UK.

The Program

All five days followed roughly the same pattern. They started with breakfast at 8h followed by two lectures before lunch. After lunch there was one more lecture followed by a coffee break. The after-coffee-break program varied throughout the week:

  • The first two days students were presenting their posters,
  • on the third day there was a guided tour to ancient city of Ragusa Ibla,
  • on the fourth day the afternoon was filled with the Reading Group and
  • on the last day Friday, there was the examinations, the student presentations and an award ceremony.

Generally speaking everything was very well organized, even – if you allow me this cliché – from a German point of view. The issues I was told or experienced myself were mostly due to circumstances outside of the ICVSS staff’s influence like airlines or the hotel being sloppy.

In fact all lectures and events were starting so punctual that I had trouble being always punctual 😉

They also managed to give the summer school a nice pace. There was no downtime to get bored and you never had to stress to see or do the things you wanted.

Lectures

My only issue with the program was that the lectures were too long. Some lasted 2h without break. They changed that halfway through the summer school and I hope that sticks for the next years. I would suggest cutting the all lectures into blocks of 50min followed by 10 min breaks.

Poster Sessions

In the poster sessions students could present their work to the other students to get feedback, which is always very valuable as the other participants are from a similar field with out being too familiar to you work to ask the right questions.

The was also a competition for best poster. The two winners received a money price (700€) and were asked to present their work in a short talk on Friday after the examinations. (Congratulations again to Christof Hoppe with Photogrammetric Camera Network Design for Micro Aerial Vehicles)

My only quarrels where that some people did not go to the second floor, because there were not enough signs. Furthermore the rooms were too crowded, especially in the corners where posters were hanging on each side, although there was unused room upstairs in the gallery.

Reading Group and Essay Contest

The aim of the reading group is to teach and practice the skill of reading research papers. To take part you have to prepare a homework “studying (not just reading) one or more topics provided by the school committee, and tracing the ideas as far back as you can.” This year the topic was image features and last year it was shapes. The groups and individuals with the best and most interesting submissions will be asked to present their work during the reading group at the summer school followed by a discussion. The group or the individual with the best presentation is awarded a money price of 1000$. If I remember correctly only 18 groups or individuals participated and you can listen and take part in the discussion even if you haven’t sent in homework. Never the less, I would urge you to hand in something as the organizer of the reading group, Stefano Soatto, give extensive feedback.

The essay contest (also called Brady Price) was about discussing the current and the future “real world” social impact of computer vision technology. There were two topics to choose from (Urban Landscapes and Computer Vision and Medicine) and the two winner were asked to read out their essay and received 600€ in price money.

Exam

The exam consisted of 37 multiple choice questions covering the lectures and workshops. You had to answer 17 correctly to pass the exam and receive a separate certificate. Only a few of the lecturers gave useful example questions after their lectures so we were not really sure what to expect. In the end the questions were quite fair and sensible. I would say, you can pass studying in your room at night if you paid attention in all the lectures.

Social Events

I actually attended my first beach party during the summer school. The other activities were also very enjoyable. Look forward to them.

Venue, Accommodation and Food.

When I told my friends about the summer school I found myself using the following phrase a lot:

It was the nicest prison I have ever stayed in.

Maybe I am just not used to resort vacation, but I think this description fits. The hotel is remote and you wont be able to leave the place and find anything in walking distance except for a small fishing village and a beach.

Nevertheless I liked the place, as it provides everything one needs. I even was able to buy some swimming pants and flip-flops I forgot back in Germany.

For each meal the hotel would provide a varied buffet and I can’t remember anyone complaining about the food. If you are vegetarian it is definitely doable without starving taste-buds.

Internet

The internet connection was bad! Very bad!! In theory there was WiFi in the lobby of the hotel, the lecture hall and in the foyer of the lecture hall, where the organizers have their temporary office. (Note: No WiFi in the rooms!)

But in practice the WiFi connections were so slow that sometimes not even emails would load and there was a connection timeout of like 5 mins after which one had to re-enter a personal code of nearly 20 characters, which made using the internet on the phone way more annoying than fun. As I wrote the resort is very remote, so it is probably cost prohibitive to get faster internet for the few internet addicts visiting once a year. Which is a pity as the organizers try to push the use of social media (facebook/twitter) during the conference, which is imho a fun idea. But my proposal would be to shut off the WiFi in the lecture hall. As a result, people are less tempted (and less frustrated) by the internet during the talks and there would be more bandwidth left for the people sitting in the foyer (doing important stuff ™)

Money

If you coming from Europe this summer school will cost you roughly 1650€. (600€ for the school, 750€ for a single room and 300€ for the flight.  You can cut the costs to 1250€ by reserving a bed in a 4 person room (450€) and by booking your flight early with websites like skyscanner.com.  This is still quite expensive, but I was quite satisfied in the end. There were no hidden costs, they didn’t seem to throw out money for totally unnecessary things and they didn’t try to sell things, which I cannot stress enough.

More warning than recommendation is this résumé by Roman Shapovalov, who choose a hotel in the near village to stay for ICVSS 2010.

People

As the organizers told us in the opening presentation, the most important part of this conference are the people we meet. For some working on their own at their home universities this might be a first time to feel as part of a community. And the process of becoming a community is deliberately amplified by the choice of such a remote venue. For every meal and for every activity we stayed together, so you got to know the other people very fast.

Also most of the lecturers stay for more than a day, so this was a great chance to interact with them in a very relaxed environment. Some even brought their families, which shows how much they enjoy this summer school themselves.

Summary

I liked it a lot and I think I will go again. If not next year, in 2014, even though I will have to pay it from my student scholarship. I’ll be probably booking a bed in a four-person room, which makes it cheaper and more interactive.

My recommendation is to go as soon as you finished your literature review and have a some results to present. Make sure you have something to show and talk about. Than you can learn and profit from the connections you make and the tips you receive a long time.

There is also the CVML Summer Schools organized by INRIA, France, which ended this year just before ICVSS. If you have enough money then go to both, otherwise choose with regard to the speakers.

Tipps

  • Prepare an elevator speech
  • Leave your notebook at home. Remember there is no internet and you are there to meet people. You can bring yours slides on a USB stick if you are planning on winning the competition.
  • Don’t go to bed too early, sleep after lunch. You don’t miss anything and it is too hot to do anything anyway.
  • Plan one or two days of extra stay in Sicily. It is easier to ignore the beach next to the hotel, when you know you have time afterwards to go to the beach – you pay for the return flight anyway! My recommendations would be Siracusa and Stromboli.
  • Don’t forget your swimming pants! You need them and they are expensive in the resort.

Further reading:

Did you attend ICVSS? What was your experience?
Thinking about going and having questions left?
Leave a comment!

[update] corrected breakfast time and modified intro

A list of lists of PhD resources

On your way to become a PhD, you not only have to learn how to do research, you also have to learn how to communicate your ideas comprehensible in text and speech, how to build the tools you need and how to survive in the microcosmos of supervisors, colleagues and under grad students of your research lab. But you are not the first to go through all this and people have written extensive advice for every problem you might encounter.  And as they are so popular right now, I present you here my: 

List of lists of PhD resources for computer scientists.

List of lists of lists

List of lists of lists!?

The most condensed summary I have found on the website of my work group IUPR. It is a good starter and gives one an overview of all the things one has to keep in mind and pay attention to.

From the most condensed to the most comprehensive. This collection links to nearly 100 articles on Ph.D. dissertation/research, presentations, writing,  reviewing/refereeing,  being a faculty member, job hunting, learning English and more. The list is overwhelming.

Links to documents on giving talks and writing papers and proposals.

from the UCSD VLSI CAD LABORATORY

If you did not actually study computer science (like me) or your courses mainly covered logic and reducing NP-complete problems, this site can probably help you a lot. Software carpentry is about learning the skills to write reliable software and using the existing tools efficiently. The website offers tutorials on basic programming, version control, testing, using the shell, relational databases,  matrix programming, program designing, spreadsheets, data management, and software life-cycles.

[UPDATE] How could I forget:

Like the well known Stackoverflow.com Academia is “a collaboratively edited question and answer site for academics and those enrolled in higher education.” It is still in its beta phase, but growing everyday. I like the aspect, that it will be always more current and extensive than all the pages only maintained by individuals or single work groups. And if you can’t find the information you need, you can always ask for help.

So what do you think? Do you find these resources useful? Some of them are already quite old. Do you think they are obsolete? What are you tips? Which collections did I forget?

Use all the cores!

Use all the cores

Use all the cores with GPGPU and HPC

The last months I have been diving into GPGPU programming with (Py)CUDA. Everytime you are working with images you are faced with huge amounts of data, so more speed is always welcome. CUDA promises speedups up to 300x, but this comes with a price of having to implement more or less everything yourself on a low level. In times this can be very challenging, so to cheer me up in the hard times I made this picture based on a popular internet meme. I also think the GPGPU and HPC community is in need of some catchy visuals.  Read more about this and other memes at Know Your Memes, X all the Y.

Let me know, what you think and spread it!

Can Stack Exchange save scientific peer review? [Update]

One of the few things everybody seems to agree on is that the scientific review process, especially for computer science, is broken. I wont go into details here as there are many sources on the net.

But personally I found Yann LeCun’s pamphlets for “A New Publishing Model in Computer Science” inspiring. He proposes an open, karma-based online repository which I will summarize as follows:

  • In this system authors post their papers as soon as they feel, that there finished. The publication is put under version control and is immediately citable.
  • “Reviewing Entities” (RE), individuals or groups like editorial boards, then choose papers they want to review or accept review requests from authors.
  • REs do not “own” papers exclusively, so RE can choose to review any paper at any time. Papers can be reviewed by multiple REs.
  • The reviews are published with the paper and are themselves citable documents like regular publications.
  • Reviews are furthermore rated by readers. Good reviews will generate “karma” points for the RE, to show the usefulness of their review.
  • Additionally RE’s “karma” will increase if they are the first to positively review a paper which is than later rated as high quality by other REs as well. As a result RE will have an incentive to be the first to review good papers.

I will not repeat LeCun’s explanations on how it works in detail and why it would be superior to the existing system. Instead I want to point out how very similar this approach is to the Stack Exchange (SE) QA websites. Stack Exchange is a network of over 70 Q&A websites, with stackoverflow.com, a Q&A site on programming, being the first and largest one. On Stack Exchange websites everyone can ask questions which can be answered by all the members of the community. Both questions and answers will be rated by the community, so users are incentivized to write useful answers to questions which are relevant to many other users in order to gain reputation.

Especially if you have used a SE website, it is hard to ignore the similarities. Even though the SE framework was build to solve a different problem, I can see it being adapted to act as a repository for LeCun’s publishing model. Publications would be questions and reviews would be answers. I can only make out following necessary changes.

  • There needs to support for groups (RE),
  • high level users should not be permitted to change other people’s posts anymore and
  • the ‘answer’ functionality has to be removed.

Everyone who follows the two founders of Stack Exchange, Jeff Atwood and Joel Spolsky, knows, how determine both are to remove all diversion of their vision for Stack Exchange, so it wouldn’t be possible to be officially part of the SE community. But there is also OSQA, the open source copy of SE. Using this service makes it seem possible to implement the necessary features.

So, what do you think? Can Stack Exchange save scientific peer review?

[UPDATE]

LeCun was so generous to comment on my article via e-mail. He confirmed that his views on the peer review process and his model haven’t changed and agrees that creating the technical infrastructure shouldn’t be too hard. He already received several offers from possible volunteers, but the project is still missing a highly competent developer(-team) to “own” the project.

Disclaimer: I am not the first one to bring Stack Exchange on the table, but I found the other approach far less concrete.