Yongqin Xian (Postdoc)

Dr. Yongqin Xian

Max-Planck-Institut für Informatik
Saarland Informatics Campus
Campus E1 4
66123 Saarbrücken
E1 4 - 618
+49 681 9325 2118
+49 681 9325 2099

Personal Information

Personal Website

I have master/bachelor thesis available. Topics are in the area of learning from limited supervision for various computer vision tasks e.g. image/video classification, semantic segmentation, object detection and 3D reconstruction.



Attribute Prototype Network for Zero-Shot Learning
W. Xu, Y. Xian, J. Wang, B. Schiele and Z. Akata
Advances in Neural Information Processing Systems 33 (NIPS 2020), 2020
Analyzing the Dependency of ConvNets on Spatial Information
Y. Fan, Y. Xian, M. M. Losch and B. Schiele
Technical Report, 2020
(arXiv: 2002.01827)
Intuitively, image classification should profit from using spatial information. Recent work, however, suggests that this might be overrated in standard CNNs. In this paper, we are pushing the envelope and aim to further investigate the reliance on spatial information. We propose spatial shuffling and GAP+FC to destroy spatial information during both training and testing phases. Interestingly, we observe that spatial information can be deleted from later layers with small performance drops, which indicates spatial information at later layers is not necessary for good performance. For example, test accuracy of VGG-16 only drops by 0.03% and 2.66% with spatial information completely removed from the last 30% and 53% layers on CIFAR100, respectively. Evaluation on several object recognition datasets (CIFAR100, Small-ImageNet, ImageNet) with a wide range of CNN architectures (VGG16, ResNet50, ResNet152) shows an overall consistent pattern.
Learning from Limited Labeled Data - Zero-Shot and Few-Shot Learning
Y. Xian
PhD Thesis, Universität des Saarlandes, 2020
Generalized Many-Way Few-Shot Video Classification
Y. Xian, B. Korbar, M. Douze, B. Schiele, Z. Akata and L. Torresani
Technical Report, 2020
(arXiv: 2007.04755)
Few-shot learning methods operate in low data regimes. The aim is to learn with few training examples per class. Although significant progress has been made in few-shot image classification, few-shot video recognition is relatively unexplored and methods based on 2D CNNs are unable to learn temporal information. In this work we thus develop a simple 3D CNN baseline, surpassing existing methods by a large margin. To circumvent the need of labeled examples, we propose to leverage weakly-labeled videos from a large dataset using tag retrieval followed by selecting the best clips with visual similarities, yielding further improvement. Our results saturate current 5-way benchmarks for few-shot video classification and therefore we propose a new challenging benchmark involving more classes and a mixture of classes with varying supervision.
Semantic Projection Network for Zero- and Few-Label Semantic Segmentation
Y. Xian, S. Choudhury, Y. He, B. Schiele and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning
Y. Xian, S. Sharma, B. Schiele and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes. To learn the class conditional distribution of CNN features, these models rely on pairs of image features and class attributes. Hence, they can not make use of the abundance of unlabeled data samples. In this paper, we tackle any-shot learning problems i.e. zero-shot and few-shot, in a unified feature generating framework that operates in both inductive and transductive learning settings. We develop a conditional generative model that combines the strength of VAE and GANs and in addition, via an unconditional discriminator, learns the marginal feature distribution of unlabeled images. We empirically show that our model learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e. inductive and transductive (generalized) zero- and few-shot learning settings. We also demonstrate that our learned features are interpretable: we visualize them by inverting them back to the pixel space and we explain them by generating textual arguments of why they are associated with a certain label.
Zero-shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly
Y. Xian, C. H. Lampert, B. Schiele and Z. Akata
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 41, Number 9, 2019
Due to the importance of zero-shot learning, i.e. classifying images where there is a lack of labeled training data, the number of proposed approaches has recently increased steadily. We argue that it is time to take a step back and to analyze the status quo of the area. The purpose of this paper is three-fold. First, given the fact that there is no agreed upon zero-shot learning benchmark, we first define a new benchmark by unifying both the evaluation protocols and data splits of publicly available datasets used for this task. This is an important contribution as published results are often not comparable and sometimes even flawed due to, e.g. pre-training on zero-shot test classes. Moreover, we propose a new zero-shot learning dataset, the Animals with Attributes 2 (AWA2) dataset which we make publicly available both in terms of image features and the images themselves. Second, we compare and analyze a significant number of the state-of-the-art methods in depth, both in the classic zero-shot setting but also in the more realistic generalized zero-shot setting. Finally, we discuss in detail the limitations of the current status of the area which can be taken as a basis for advancing it.
Feature Generating Networks for Zero-Shot Learning
Y. Xian, T. Lorenz, B. Schiele and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
Zero-shot learning - The Good, the Bad and the Ugly
Y. Xian, B. Schiele and Z. Akata
30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017
Latent Embeddings for Zero-shot Classification
Y. Xian, Z. Akata, G. Sharma, Q. Nguyen, M. Hein and B. Schiele
29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 2016