Learning Using Privileged Information: SVM+ and Weighted SVM

@article{lapin2014learning,
  title = {Learning Using Privileged Information: {SVM+} and Weighted {SVM}},
  author = {Maksim Lapin and Matthias Hein and Bernt Schiele},
  journal = {Neural Networks},
  volume = {53},
  pages = {95--108},
  year = {2014}
}

SVM+ and Weighted SVM relationship

Figure 1. Weighted SVM can always replicate an SVM+ solution.
Figure 2. The converse is not true: an example of a Weighted SVM solution that cannot be found by SVM+.

Prior knowledge can be used to improve predictive performance of learning algorithms or reduce the amount of data required for training. The same goal is pursued within the learning using privileged information paradigm which was recently introduced by Vapnik et al. [2] and is aimed at utilizing additional information available only at training time – a framework implemented by SVM+.

In [1], we relate the privileged information to importance weighting and show that the prior knowledge expressible with privileged features can also be encoded by weights associated with every training example.

We show that Weighted SVM can always replicate an SVM+ solution (Figure 1), while the converse is not true and we construct a counterexample (Figure 2) highlighting the limitations of SVM+.

Our findings show that any SVM+ solution has an equivalent WSVM setting that puts more weight on hard examples (the points with higher loss) than the standard SVM would do:

∑ωiξi ≥ 1⁄n ∑ξi

where ωi=ci/∑cj is the normalized weight and ξi is the hinge loss of the ith data point.

Moreover, given an SVM+ solution represented in terms of dual variables α and β, one can define instance weights ci as

ci = αi + βi

to ensure that Weighted SVM withe the given weights ci finds an equivalent solution.

Learning weights for Weighted SVM

Finally, we touch on the problem of choosing weights for Weighted SVMs when privileged features are not available. We explored the setting of Vapnik et al. [2] and show that learning the weights helps in handwritten digit recognition (MNIST data).

We argue that if there is vast amount of data available for model selection (e.g. a validation set), it can be used to learn instance weights that allow the Weighted SVM to outperform SVM+ (Figure 3).

Moreover, the weight learning can be combined with other sources of additional information like e.g. translation invariance, which leads to further performance improvement (Figure 4).

Figure 3. Error rate comparison in the original setting of Vapnik et al. [2] (lower is better).
Figure 4. The extended setting where each digit is translated by 1 pixel in each of the 8 directions.

References

[1]  M. Lapin, M. Hein and B. Schiele, Learning Using Privileged Information: SVM+ and Weighted SVM. Neural Networks, 53: 95-108, 2014.

[2]  V. Vapnik and A. Vashist, A new learning paradigm: Learning using privileged informationNeural Networks, 22:5-6 544-557, 2009.

To top