In this blog post I want to share some interessting articles which deal with data sets in computer vision. For starters, in this blog post Tomasz Malisiewicz draws attention to a video lecture by Peter Norvig (Google) in which Mr Norvig showed some interesting results
where algorithms that obtained the best performance on a small dataset no longer did the best when the size of the training set was increased by an order of magnitude. … Also, the mediocre algorithms in the small training size regime often outperformed their more complicated counterparts once more data was utilized.
This is indeed interesting as it is always hard to say how much training and test data is necessary and most scientist, me as well, a far more interested in working on their precious algorithm instead of collecting a solid ground truth. Furthermore, as I pointed out in a comment for Tomasz’ blog post, using 10 times as many pictures would mean, I could only evaluate 3 feature combinations in the time I could have evaluated 30.
Answering to my question on how to handle that trade-off, he advocates nonparametric* approaches and
combining learning with data-driven approaches to reduce test time complexity.
I agree with him, that we definitely should spent more time and effort creating larger groundtruth sets, instead of optimizing our algorithms for a groundtruth that is too small to reveal anything.
For further reading I refer to Prof. Jain’s Blog, where he claims in his blog post, Evaluating Multimedia Algorithms, that the existing data sets for photo retrieval are
too small such as the Corel or Pascal datasets, too specific like the TRECVID dataset, or without ground truth, such as the several recent efforts by MIT and MSRA that gathered millions of Web images for testing,
and promotes his concept for gathering controlled data ground truths.
As the third read the Scienceblog features a story about a James DiCarlo, a neuroscientist in the McGovern Institute for Brain Research at MIT and graduates students Nicolas Pinto and David Cox of the Rowland Harvard Institute who
argue that natural photographic image sets, like the widely used Caltech101 database, have design flaws that enable computers to succeed where they would fail with more authentically varied images. For example, photographers tend to center objects in a frame and to prefer certain views and contexts. The visual system, by contrast, encounters objects in a much broader range of conditions.
They go on
We suspected that the supposedly natural images in current computer vision tests do not really engage the central problem of variability, and that our intuitions about what makes objects hard or easy to recognize are incorrect.”
I think all the three articles remind us, to reconsider the data sets we use for evaluation. Regarding their size,noisiness and their ‘naturality’.
* nonparametric as in using rank or order of the images