D2
Computer Vision and Machine Learning
2020
Hierarchical Online Instance Matching for Person Search
D. Chen, S. Zhang, W. Ouyang, J. Yang and B. Schiele
AAAI Technical Track: Vision, 2020
Manipulating Attributes of Natural Scenes via Hallucination
L. Karacan, Z. Akata, A. Erdem and E. Erdem
ACM Transactions on Graphics, Volume 39, Number 1, 2020
XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera
D. Mehta, O. Sotnychenko, F. Mueller, W. Xu, M. Elgharib, P. Fua, H.-P. Seidel, H. Rhodin, G. Pons-Moll and C. Theobalt
ACM Transactions on Graphics (Proc. ACM SIGGRAPH 2020), Volume 39, Number 4, 2020
LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration
B. L. Bhatnagar, C. Sminchisescu, C. Theobalt and G. Pons-Moll
Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020
GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators
D. Chen, T. Orekondy and M. Fritz
Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020
Neural Unsigned Distance Fields for Implicit Function Learning
J. Chibane, A. Mir and G. Pons-Moll
Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020
Deep Wiener Deconvolution: Wiener Meets Deep Learning for Image Deblurring
J. Dong, S. Roth and B. Schiele
Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020
Attribute Prototype Network for Zero-Shot Learning
W. Xu, Y. Xian, J. Wang, B. Schiele and Z. Akata
Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020
GAN-Leaks: A Taxonomy of Membership Inference Attacks against GANs
D. Chen, N. Yu, Y. Zhang and M. Fritz
CCS ’20, ACM SIGSAC Conference on Computer and Communications Security, 2020
Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction
B. L. Bhatnagar, C. Sminchisescu, C. Theobalt and G. Pons-Moll
Computer Vision -- ECCV 2020, 2020
Kinematic 3D Object Detection in Monocular Video
G. Brazil, G. Pons-Moll, X. Liu and B. Schiele
Computer Vision -- ECCV 2020, 2020
NASA: Neural Articulated Shape Approximation
B. Deng, J. P. Lewis, T. Jeruzalski, G. Pons-Moll, G. Hinton, M. Norouzi and A. Tagliasacchi
Computer Vision -- ECCV 2020, 2020
Segmentations-Leak: Membership Inference Attacks and Defenses in Semantic Image Segmentation
Y. He, S. Rahimian, B. Schiele and M. Fritz
Computer Vision -- ECCV 2020, 2020
An Ensemble of Epoch-wise Empirical Bayes for Few-shot Learning
Y. Liu, B. Schiele and Q. Sun
Computer Vision -- ECCV 2020, 2020
Towards Recognizing Unseen Categories in Unseen Domains
M. Mancini, Z. Akata, E. Ricci and B. Caputo
Computer Vision -- ECCV 2020, 2020
Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers
M. Rolínek, P. Swoboda, D. Zietlow, A. Paulus, V. Musil and G. Martius
Computer Vision -- ECCV 2020, 2020
Towards Automated Testing and Robustification by Semantic Adversarial Data Generation
R. Shetty, M. Fritz and B. Schiele
Computer Vision -- ECCV 2020, 2020
SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing
G. Tiwari, B. L. Bhatnagar, T. Tung and G. Pons-Moll
Computer Vision -- ECCV 2020, 2020
Inclusive GAN: Improving Data and Minority Coverage in Generative Models
N. Yu, K. Li, P. Zhou, J. Malik, L. Davis and M. Fritz
Computer Vision -- ECCV 2020, 2020
Unsupervised Shape and Pose Disentanglement for 3D Meshes
K. Zhou, B. L. Bhatnagar and G. Pons-Moll
Computer Vision -- ECCV 2020, 2020
Sparse Recovery with Integrality Constraints
J.-H. Lange, M. E. Pfetsch, B. M.Seib and A. M.Tillmann
Discrete Applied Mathematics, Volume 283, 2020
Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing
V. Agarwal, R. Shetty and M. Fritz
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
Normalizing Flows With Multi-Scale Autoregressive Priors
A. Bhattacharyya, S. Mahajan, M. Fritz, B. Schiele and S. Roth
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
Norm-Aware Embedding for Efficient Person Search
D. Chen, S. Zhang, J. Yang and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion
J. Chibane, T. Alldieck and G. Pons-Moll
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
Evaluating Weakly Supervised Object Localization Methods Right
J. Choe, S. J. Oh, S. Lee, S. Chun, Z. Akata and H. Shim
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
DeepCap: Monocular Human Performance Capture Using Weak Supervision
M. Habermann, W. Xu, M. Zollhöfer, G. Pons-Moll and C. Theobalt
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
Learning Interactions and Relationships between Movie Characters
A. Kukleva, M. Tapaswi and I. Laptev
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
Mnemonics Training: Multi-Class Incremental Learning Without Forgetting
Y. Liu, Y. Su, A.-A. Liu, B. Schiele and Q. Sun
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
Learning to Dress 3D People in Generative Clothing
Q. Ma, J. Yang, A. Ranjan, S. Pujades, G. Pons-Moll, S. Tang and M. J. Black
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
Learning to Transfer Texture from Clothing Images to 3D Humans
A. Mir, T. Alldieck and G. Pons-Moll
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style
C. Patel, Z. Liao and G. Pons-Moll
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
A U-Net Based Discriminator for Generative Adversarial Networks
E. Schönfeld, B. Schiele and A. Khoreva
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering
M. Keuper, S. Tang, B. Andres, T. Brox and B. Schiele
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 42, Number 1, 2020
Person Recognition in Personal Photo Collections
S. J. Oh, R. Benenson, M. Fritz and B. Schiele
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 42, Number 1, 2020
Meta-Transfer Learning through Hard Tasks
Q. Sun, Y. Liu, Z. Chen, T.-S. Chua and B. Schiele
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020
SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera
D. Tome, T. Alldieck, P. Peluse, G. Pons-Moll, L. Agapito, H. Badino and F. de la Torre
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020
DoubleFusion: Real-time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor
T. Yu, Z. Zheng, K. Guo, J. Zhao, Q. Dai, H. Li, G. Pons-Moll and Y. Liu
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 42, Number 10, 2020
Learning Robust Representations via Multi-View Information Bottleneck
M. Federici, A. Dutta, P. Forré, N. Kushman and Z. Akata
International Conference on Learning Representations (ICLR 2020), 2020
Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks
T. Orekondy, B. Schiele and M. Fritz
International Conference on Learning Representations (ICLR 2020), 2020
Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based Image Retrieval
A. Dutta and Z. Akata
International Journal of Computer Vision, Volume 128, 2020
Deep Gaze Pooling: Inferring and Visually Decoding Search Intents from Human Gaze Fixations
H. Sattar, M. Fritz and A. Bulling
Neurocomputing, Volume 387, 2020
Anticipating Averted Gaze in Dyadic Interactions
P. Müller, E. Sood and A. Bulling
Proceedings ETRA 2020 Full Papers, 2020
Diverse and Relevant Visual Storytelling with Scene Graph Embeddings
X. Hong, R. Shetty, A. Sayeed, K. Mehra, V. Demberg and B. Schiele
Proceedings of the 24th Conference on Computational Natural Language Learning (CoNLL 2020), 2020
Lifted Disjoint Paths with Application in Multiple Object Tracking
A. Horňáková, R. Henschel, B. Rosenhahn and P. Swoboda
Proceedings of the 37th International Conference on Machine Learning (ICML 2020), 2020
Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks
D. Stutz, M. Hein and B. Schiele
Proceedings of the 37th International Conference on Machine Learning (ICML 2020), 2020
A Primal-Dual Solver for Large-Scale Tracking-by-Assignment
S. Haller, M. Prakash, L. Hutschenreiter, T. Pietzsch, C. Rother, F. Jug, P. Swoboda and B. Savchynskyy
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS 2020), 2020
PoseTrackReID: Dataset Description
A. Doering, D. Chen, S. Zhang, B. Schiele and J. Gall
Technical Report, 2020
(arXiv: 2011.06243)
Abstract
Current datasets for video-based person re-identification (re-ID) do not include structural knowledge in form of human pose annotations for the persons of interest. Nonetheless, pose information is very helpful to disentangle useful feature information from background or occlusion noise. Especially real-world scenarios, such as surveillance, contain a lot of occlusions in human crowds or by obstacles. On the other hand, video-based person re-ID can benefit other tasks such as multi-person pose tracking in terms of robust feature matching. For that reason, we present PoseTrackReID, a large-scale dataset for multi-person pose tracking and video-based person re-ID. With PoseTrackReID, we want to bridge the gap between person re-ID and multi-person pose tracking. Additionally, this dataset provides a good benchmark for current state-of-the-art methods on multi-frame person re-ID.
Analyzing the Dependency of ConvNets on Spatial Information
Y. Fan, Y. Xian, M. M. Losch and B. Schiele
Technical Report, 2020
(arXiv: 2002.01827)
Abstract
Intuitively, image classification should profit from using spatial information. Recent work, however, suggests that this might be overrated in standard CNNs. In this paper, we are pushing the envelope and aim to further investigate the reliance on spatial information. We propose spatial shuffling and GAP+FC to destroy spatial information during both training and testing phases. Interestingly, we observe that spatial information can be deleted from later layers with small performance drops, which indicates spatial information at later layers is not necessary for good performance. For example, test accuracy of VGG-16 only drops by 0.03% and 2.66% with spatial information completely removed from the last 30% and 53% layers on CIFAR100, respectively. Evaluation on several object recognition datasets (CIFAR100, Small-ImageNet, ImageNet) with a wide range of CNN architectures (VGG16, ResNet50, ResNet152) shows an overall consistent pattern.
Improved Methods and Analysis for Semantic Image Segmentation
Y. He
PhD Thesis, Universität des Saarlandes, 2020
Abstract
Modern deep learning has enabled amazing developments of computer vision in recent years (Hinton and Salakhutdinov, 2006; Krizhevsky et al., 2012). As a fundamental task, semantic segmentation aims to predict class labels for each pixel of images, which empowers machines perception of the visual world. In spite of recent successes of fully convolutional networks (Long etal., 2015), several challenges remain to be addressed. In this thesis, we focus on this topic, under different kinds of input formats and various types of scenes. Specifically, our study contains two aspects: (1) Data-driven neural modules for improved performance. (2) Leverage of datasets w.r.t.training systems with higher performances and better data privacy guarantees. In the first part of this thesis, we improve semantic segmentation by designing new modules which are compatible with existing architectures. First, we develop a spatio-temporal data-driven pooling, which brings additional information of data (i.e. superpixels) into neural networks, benefiting the training of neural networks as well as the inference on novel data. We investigate our approach in RGB-D videos for segmenting indoor scenes, where depth provides complementary cues to colors and our model performs particularly well. Second, we design learnable dilated convolutions, which are the extension of standard dilated convolutions, whose dilation factors (Yu and Koltun, 2016) need to be carefully determined by hand to obtain decent performance. We present a method to learn dilation factors together with filter weights of convolutions to avoid a complicated search of dilation factors. We explore extensive studies on challenging street scenes, across various baselines with different complexity as well as several datasets at varying image resolutions. In the second part, we investigate how to utilize expensive training data. First, we start from the generative modelling and study the network architectures and the learning pipeline for generating multiple examples. We aim to improve the diversity of generated examples but also to preserve the comparable quality of the examples. Second, we develop a generative model for synthesizing features of a network. With a mixture of real images and synthetic features, we are able to train a segmentation model with better generalization capability. Our approach is evaluated on different scene parsing tasks to demonstrate the effectiveness of the proposed method. Finally, we study membership inference on the semantic segmentation task. We propose the first membership inference attack system against black-box semantic segmentation models, that tries to infer if a data pair is used as training data or not. From our observations, information on training data is indeed leaking. To mitigate the leakage, we leverage our synthetic features to perform prediction obfuscations, reducing the posterior distribution gaps between a training and a testing set. Consequently, our study provides not only an approach for detecting illegal use of data, but also the foundations for a safer use of semantic segmentation models.
Towards Accurate Multi-Person Pose Estimation in the Wild
E. Insafutdinov
PhD Thesis, Universität des Saarlandes, 2020
Multicut Optimization Guarantees & Geometry of Lifted Multicuts
J.-H. Lange
PhD Thesis, Universität des Saarlandes, 2020
Sensing, Interpreting, and Anticipating Human Social Behaviour in the Real World
P. Müller
PhD Thesis, Universität des Saarlandes, 2020
Understanding and Controlling Leakage in Machine Learning
T. Orekondy
PhD Thesis, Universität des Saarlandes, 2020
InfoScrub: Towards Attribute Privacy by Targeted Obfuscation
H.-P. Wang, T. Orekondy and M. Fritz
Technical Report, 2020
(arXiv: 2005.10329)
Abstract
Personal photos of individuals when shared online, apart from exhibiting a myriad of memorable details, also reveals a wide range of private information and potentially entails privacy risks (e.g., online harassment, tracking). To mitigate such risks, it is crucial to study techniques that allow individuals to limit the private information leaked in visual data. We tackle this problem in a novel image obfuscation framework: to maximize entropy on inferences over targeted privacy attributes, while retaining image fidelity. We approach the problem based on an encoder-decoder style architecture, with two key novelties: (a) introducing a discriminator to perform bi-directional translation simultaneously from multiple unpaired domains; (b) predicting an image interpolation which maximizes uncertainty over a target set of attributes. We find our approach generates obfuscated images faithful to the original input images, and additionally increase uncertainty by 6.2$\times$ (or up to 0.85 bits) over the non-obfuscated counterparts.
Learning from Limited Labeled Data - Zero-Shot and Few-Shot Learning
Y. Xian
PhD Thesis, Universität des Saarlandes, 2020
Generalized Many-Way Few-Shot Video Classification
Y. Xian, B. Korbar, M. Douze, B. Schiele, Z. Akata and L. Torresani
Technical Report, 2020
(arXiv: 2007.04755)
Abstract
Few-shot learning methods operate in low data regimes. The aim is to learn with few training examples per class. Although significant progress has been made in few-shot image classification, few-shot video recognition is relatively unexplored and methods based on 2D CNNs are unable to learn temporal information. In this work we thus develop a simple 3D CNN baseline, surpassing existing methods by a large margin. To circumvent the need of labeled examples, we propose to leverage weakly-labeled videos from a large dataset using tag retrieval followed by selecting the best clips with visual similarities, yielding further improvement. Our results saturate current 5-way benchmarks for few-shot video classification and therefore we propose a new challenging benchmark involving more classes and a mixture of classes with varying supervision.