Semi-supervised Learning in Image Collections

Supervised learning is the de facto standard for many computer vision tasks such as object recognition or scene categorization. Powerful classifiers can obtain impressive results but require sufficient amounts of annotated training data. However, supervised methods have important limitations: Annotation is expensive, prone to error, often biased, and does not scale to large datasets. Facing these limitations we argue that the computer vision community should move beyond supervised methods and more seriously tap into the vast collections of images available today.

In [1], we explored ways of using the large amount of available image data (up to 30,000 images) in order to overcome inherent problems of supervised approaches. An important conclusion of our study is that the local structure matters more than the SSL algorithm. We indeed observed that the right set of parameters (image representation, distance measure and strategy to use it) can literally predict the SSL accuracy. A few works in the past made this claim together with the remark that there is only little work on the structure itself.
To improve our graph structure, we use metric learning in [2] that learns a representation better suited to the task at hand. We show on three different datasets a consistent improvement and propose Interleaved Metric Learning Propagation (IMLP) that addresses the missing generalization of metric learning due to the small amount of labels.

In [3], we propose a novel active learning framework that consider the entire labeling procedure as a process modelled as a Markov decision process (MDP). This gives us the flexibility to handle more than two criteria while simultaneously achieving an adaptive and time-varying trade-off between exploration and exploitation. Furthermore, we provide a novel sampling criteria graph density that use the graph structure to find highly connected and thus representative images.

RALF code

Animals with Attributes

Additional descriptors used in [1]
Gist (implementation by [Oliva2001])
HOG (our implementation with cells of 8x8 pixes)

References

[1]  Extracting Structures in Image Collections for Object Recognition, S. Ebert, D. Larlus and B. Schiele, European Conference on Computer Vision (ECCV), September, (2010)

[2]  Pick your Neighborhood -- Improving Labels and Neighborhood Structure for Label Propagation, S. Ebert, M. Fritz and B. Schiele, Pattern Recognition (DAGM), September, (2011)

[3]  RALF: A Reinforced Active Learning Formulation for Object Class Recognition, S. Ebert, M. Fritz and B. Schiele, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June, (2012)