Scalable Multitask Representation Learning for Scene Classification

@inproceedings{lapin2014cvpr,
  title = {Scalable Multitask Representation Learning for Scene Classification},
  author = {Maksim Lapin and Bernt Schiele and Matthias Hein},
  booktitle = {CVPR},
  year = {2014}
}

MTL-SDCA achieves state of the art on SUN397

In [1], we propose a multitask learning approach (MTL-SDCA) to jointly train a low-dimensional representation and the corresponding classifiers which scales to high-dimensional image descriptors, such as the Fisher Vector [2], and consistently outperforms the current (November 2013) state of the art on the SUN397 scene classification benchmark [3] with varying amounts of training data.

**Figure 1.** Comparison with previous work (higher is better). Mean accuracy (%) and standard deviation are shown for various methods on the SUN397 benchmark. Numbers in parentheses report the mean accuracy at Ntrain=50.

Consistent improvement over single task learning (STL)

The proposed method is evaluated on the SUN397 scene classification benchmark and consistently outperforms single task learning (STL) with and without color information, with varying amounts of training data and varying K when classification performance is measured via top-K accuracy.

Top-K accuracy (SIFT only) — **Figure 2.** Mean top-K accuracy (%) with SIFT only.

Top-K accuracy (SIFT and color) — **Figure 3.** SIFT and color (LCS) features.

More human-like confusions

We observed that there is a small tendency for the MTL-SDCA method to produce more human-like predictions. Figure 4 shows a hand-picked example where MTL-SDCA top-5 predictions show improvement over the single task (STL) ones. See the supplementary material for details.

Multitask learning improves upon single task learning on the given test image — **Figure 4.** Top five predictions along with thumbnail images are shown for both approaches. **Top row:** single task learning results. **Middle row:** multitask learning results. **Bottom row:** estimated human predictions.

Technical details

The proposed multitask representation learning method jointly learns a linear mapping into a lower dimensional subspace which is then used to build one-vs-all classifiers for each class. The resulting optimization problem is solved via an adaptation of the recently developed stochastic dual coordinate ascent (SDCA) algorithm [4]. For further details, please consult the paper and the supplementary material.

References

[1] M. Lapin, B. Schiele and M. Hein, Scalable Multitask Representation Learning for Scene Classification. CVPR 2014.

[2] J. Sánchez, F. Perronnin, T. Mensink and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice. International Journal of Computer Vision, 105:3 222-245, 2013.

[3] J. Xiao, J. Hays, K A. Ehinger, A. Oliva and A. Torralba, SUN Database: Large-Scale Scene Recognition from Abbey to Zoo. CVPR 2010, 3485-3492.

[4] S. Shalev-Shwartz and T. Zhang, Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization. Journal of Machine Learning Research, 14: 567-599, 2013.