Assumptions about the end user

I am in the middle of a little literature review on using machine learning for photo organisation and came across a statement that struck me as misconceived. The paper’s topic is segmenting photo streams into events and states at the end of page 5:

We believe that for end users, having a low miss rate is more valuable than having a low false alarm rate.

I believe this is a false assumption that will lead to frustrated end users. Out of my own experience I am convinced that the opposite is true.

They continue: “To correct a false alarm is a one-step process of removing the incorrect segment boundary. But to correct a miss, the user must first realize that there is a miss, then figure out the position of the segment boundary.”

Similar to face detection users will be happy about a correct detection but unhappy about an algorithm that creates wrong boundaries they have to manually correct.

And if we assume, that a conservative algorithm still finds all the strong boundaries, the user might not miss the not detected boundaries after all.

Algorithms should not create new work for the user, but remove (some of) it.

Use all the cores!

Use all the cores

Use all the cores with GPGPU and HPC

The last months I have been diving into GPGPU programming with (Py)CUDA. Everytime you are working with images you are faced with huge amounts of data, so more speed is always welcome. CUDA promises speedups up to 300x, but this comes with a price of having to implement more or less everything yourself on a low level. In times this can be very challenging, so to cheer me up in the hard times I made this picture based on a popular internet meme. I also think the GPGPU and HPC community is in need of some catchy visuals.  Read more about this and other memes at Know Your Memes, X all the Y.

Let me know, what you think and spread it!

Can Stack Exchange save scientific peer review? [Update]

One of the few things everybody seems to agree on is that the scientific review process, especially for computer science, is broken. I wont go into details here as there are many sources on the net.

But personally I found Yann LeCun’s pamphlets for “A New Publishing Model in Computer Science” inspiring. He proposes an open, karma-based online repository which I will summarize as follows:

  • In this system authors post their papers as soon as they feel, that there finished. The publication is put under version control and is immediately citable.
  • “Reviewing Entities” (RE), individuals or groups like editorial boards, then choose papers they want to review or accept review requests from authors.
  • REs do not “own” papers exclusively, so RE can choose to review any paper at any time. Papers can be reviewed by multiple REs.
  • The reviews are published with the paper and are themselves citable documents like regular publications.
  • Reviews are furthermore rated by readers. Good reviews will generate “karma” points for the RE, to show the usefulness of their review.
  • Additionally RE’s “karma” will increase if they are the first to positively review a paper which is than later rated as high quality by other REs as well. As a result RE will have an incentive to be the first to review good papers.

I will not repeat LeCun’s explanations on how it works in detail and why it would be superior to the existing system. Instead I want to point out how very similar this approach is to the Stack Exchange (SE) QA websites. Stack Exchange is a network of over 70 Q&A websites, with, a Q&A site on programming, being the first and largest one. On Stack Exchange websites everyone can ask questions which can be answered by all the members of the community. Both questions and answers will be rated by the community, so users are incentivized to write useful answers to questions which are relevant to many other users in order to gain reputation.

Especially if you have used a SE website, it is hard to ignore the similarities. Even though the SE framework was build to solve a different problem, I can see it being adapted to act as a repository for LeCun’s publishing model. Publications would be questions and reviews would be answers. I can only make out following necessary changes.

  • There needs to support for groups (RE),
  • high level users should not be permitted to change other people’s posts anymore and
  • the ‘answer’ functionality has to be removed.

Everyone who follows the two founders of Stack Exchange, Jeff Atwood and Joel Spolsky, knows, how determine both are to remove all diversion of their vision for Stack Exchange, so it wouldn’t be possible to be officially part of the SE community. But there is also OSQA, the open source copy of SE. Using this service makes it seem possible to implement the necessary features.

So, what do you think? Can Stack Exchange save scientific peer review?


LeCun was so generous to comment on my article via e-mail. He confirmed that his views on the peer review process and his model haven’t changed and agrees that creating the technical infrastructure shouldn’t be too hard. He already received several offers from possible volunteers, but the project is still missing a highly competent developer(-team) to “own” the project.

Disclaimer: I am not the first one to bring Stack Exchange on the table, but I found the other approach far less concrete.

Paper: Rendering Synthetic Objects into Legacy Photographs

Inserting 3D objects into existing photographs


This fascinating video presents a new method to insert 3D objects into existing photographs. It is based on the research of Kevin Karsch, Varsha Hedau, David Forsyth and Derek Hoiem  (all University of Illinois at Urbana-Champaign). Their main contribution is the algorithm, which generates the light model for the scene. The algorithm needs only one photograph and a few manual markings by a novice user together with a ground truth data set to create a near real life insertion. The ground truth data set was generated with 200 images from 20 indoor scenes under varying lighting conditions.

The video is well done and I am surprised whats possible, but I like to see how much user input is really necessary and how well the algorithm and the ground truth perform with other images. What do you think?

More details can be found at Kevin Karsch’s website.