2019
Learning to Reconstruct People in Clothing from a Single RGB Camera
T. Alldieck, M. A. Magnor, B. L. Bhatnagar, C. Theobalt and G. Pons-Moll
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval
A. Dutta and Z. Akata
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
In the Wild Human Pose Estimation using Explicit 2D Features and Intermediate 3D Representations
I. Habibie, W. Xu, D. Mehta, G. Pons-Moll and C. Theobalt
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Time-Conditioned Action Anticipation in One Shot
Q. Ke, M. Fritz and B. Schiele
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Combinatorial Persistency Criteria for Multicut and Max-Cut
J.-H. Lange, B. Andres and P. Swoboda
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Knockoff Nets: Stealing Functionality of Black-Box Models
T. Orekondy, B. Schiele and M. Fritz
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
E. Schönfeld, S. Ebrahimi, S. Sinha, T. Darrell and Z. Akata
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Not Using the Car to See the Sidewalk: Quantifying and Controlling the Effects of Context in Classification and Segmentation
R. Shetty, B. Schiele and M. Fritz
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
D. Stutz, M. Hein, and B. Schiele
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Meta-Transfer Learning for Few-Shot Learning
Q. Sun, Y. Liu, T.-S. Chua and B. Schiele
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
SPNet: Semantic Projection Network for Zero- and Few-Label Semantic Segmentation
Y. Xian, S. Choudhury, Y. He, B. Schiele and Z. Akata
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning
Y. Xian, S. Sharma, B. Schiele and Z. Akata
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture
N. Yu, C. Barnes, E. Shechtman, S. Amirghodsi and M. Lukáč
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
SimulCap : Single-View Human Performance Capture with Cloth Simulation
T. Yu, Z. Zheng, Y. Zhong, J. Zhao, D. Quionhai, G. Pons-Moll and Y. Liu
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
LiveCap: Real-time Human Performance Capture from Monocular Video
M. Habermann, W. Xu, M. Zollhöfer, G. Pons-Moll and C. Theobalt
ACM Transactions on Graphics, Volume 38, Number 2, 2019
Evaluation of Appearance-Based Methods and Implications for Gaze-Based Applications
X. Zhang, Y. Sugano and A. Bulling
CHI 2019, CHI Conference on Human Factors in Computing Systems, 2019
(Accepted/in press)
MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation
X. Zhang, Y. Sugano, M. Fritz and A. Bulling
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 41, Number 1, 2019
Fashion is Taking Shape: Understanding Clothing Preference Based on Body Shape From Online Sources
H. Sattar, G. Pons-Moll and M. Fritz
2019 IEEE Winter Conference on Applications of Computer Vision (WACV 2019), 2019
Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods
A. Bhattacharyya, M. Fritz and B. Schiele
International Conference on Learning Representations (ICLR 2019), 2019
(Accepted/in press)
Reducing Calibration Drift in Mobile Eye Trackers by Exploiting Mobile Phone Usage
P. Müller, D. Buschek, M. X. Huang and A. Bulling
Proceedings of the ACM Symposium on Eye Tracking Research & Applications, 2019
(Accepted/in press)
PrivacEye: Privacy-Preserving Head-Mounted Eye Tracking Using Egocentric Scene Image and Eye Movement Features
J. Steil, M. Koelle, W. Heuten, S. Boll and A. Bulling
Proceedings of the ACM Symposium on Eye Tracking Research & Applications, 2019
(Accepted/in press)
Privacy-Aware Eye Tracking Using Differential Privacy
J. Steil, I. Hagestedt, M. X. Huang and A. Bulling
Proceedings of the ACM Symposium on Eye Tracking Research & Applications, 2019
(Accepted/in press)
Bottleneck Potentials in Markov Random Fields
A. Abbas and P. Swoboda
Technical Report, 2019
(arXiv: 1904.08080)
Abstract
We consider general discrete Markov Random Fields(MRFs) with additional bottleneck potentials which penalize the maximum (instead of the sum) over local potential value taken by the MRF-assignment. Bottleneck potentials or analogous constructions have been considered in (i) combinatorial optimization (e.g. bottleneck shortest path problem, the minimum bottleneck spanning tree problem, bottleneck function minimization in greedoids), (ii) inverse problems with $L_{\infty}$-norm regularization, and (iii) valued constraint satisfaction on the $(\min,\max)$-pre-semirings. Bottleneck potentials for general discrete MRFs are a natural generalization of the above direction of modeling work to Maximum-A-Posteriori (MAP) inference in MRFs. To this end, we propose MRFs whose objective consists of two parts: terms that factorize according to (i) $(\min,+)$, i.e. potentials as in plain MRFs, and (ii) $(\min,\max)$, i.e. bottleneck potentials. To solve the ensuing inference problem, we propose high-quality relaxations and efficient algorithms for solving them. We empirically show efficacy of our approach on large scale seismic horizon tracking problems.
Moment-to-Moment Detection of Internal Thought from Eye Vergence Behaviour
M. X. Huang, J. Li, G. Ngai, H. V. Leong and A. Bulling
Technical Report, 2019
(arXiv: 1901.06572)
Abstract
Internal thought refers to the process of directing attention away from a primary visual task to internal cognitive processing. Internal thought is a pervasive mental activity and closely related to primary task performance. As such, automatic detection of internal thought has significant potential for user modelling in intelligent interfaces, particularly for e-learning applications. Despite the close link between the eyes and the human mind, only a few studies have investigated vergence behaviour during internal thought and none has studied moment-to-moment detection of internal thought from gaze. While prior studies relied on long-term data analysis and required a large number of gaze characteristics, we describe a novel method that is computationally light-weight and that only requires eye vergence information that is readily available from binocular eye trackers. We further propose a novel paradigm to obtain ground truth internal thought annotations that exploits human blur perception. We evaluate our method for three increasingly challenging detection tasks: (1) during a controlled math-solving task, (2) during natural viewing of lecture videos, and (3) during daily activities, such as coding, browsing, and reading. Results from these evaluations demonstrate the performance and robustness of vergence-based detection of internal thought and, as such, open up new directions for research on interfaces that adapt to shifts of mental attention.
SacCalib: Reducing Calibration Distortion for Stationary Eye Trackers Using Saccadic Eye Movements
M. X. Huang and A. Bulling
Technical Report, 2019
(arXiv: 1903.04047)
Abstract
Recent methods to automatically calibrate stationary eye trackers were shown to effectively reduce inherent calibration distortion. However, these methods require additional information, such as mouse clicks or on-screen content. We propose the first method that only requires users' eye movements to reduce calibration distortion in the background while users naturally look at an interface. Our method exploits that calibration distortion makes straight saccade trajectories appear curved between the saccadic start and end points. We show that this curving effect is systematic and the result of distorted gaze projection plane. To mitigate calibration distortion, our method undistorts this plane by straightening saccade trajectories using image warping. We show that this approach improves over the common six-point calibration and is promising for reducing distortion. As such, it provides a non-intrusive solution to alleviating accuracy decrease of eye tracker during long-term use.
LCC: Learning to Customize and Combine Neural Networks for Few-Shot Learning
Y. Liu, Q. Sun, A.-A. Liu, Y. Su, B. Schiele and T.-S. Chua
Technical Report, 2019
(arXiv: 1904.08479)
Abstract
Meta-learning has been shown to be an effective strategy for few-shot learning. The key idea is to leverage a large number of similar few-shot tasks in order to meta-learn how to best initiate a (single) base-learner for novel few-shot tasks. While meta-learning how to initialize a base-learner has shown promising results, it is well known that hyperparameter settings such as the learning rate and the weighting of the regularization term are important to achieve best performance. We thus propose to also meta-learn these hyperparameters and in fact learn a time- and layer-varying scheme for learning a base-learner on novel tasks. Additionally, we propose to learn not only a single base-learner but an ensemble of several base-learners to obtain more robust results. While ensembles of learners have shown to improve performance in various settings, this is challenging for few-shot learning tasks due to the limited number of training samples. Therefore, our approach also aims to meta-learn how to effectively combine several base-learners. We conduct extensive experiments and report top performance for five-class few-shot recognition tasks on two challenging benchmarks: miniImageNet and Fewshot-CIFAR100 (FC100).
A Novel BiLevel Paradigm for Image-to-Image Translation
L. Ma, Q. Sun, B. Schiele and L. Van Gool
Technical Report, 2019
(arXiv: 1904.09028)
Abstract
Image-to-image (I2I) translation is a pixel-level mapping that requires a large number of paired training data and often suffers from the problems of high diversity and strong category bias in image scenes. In order to tackle these problems, we propose a novel BiLevel (BiL) learning paradigm that alternates the learning of two models, respectively at an instance-specific (IS) and a general-purpose (GP) level. In each scene, the IS model learns to maintain the specific scene attributes. It is initialized by the GP model that learns from all the scenes to obtain the generalizable translation knowledge. This GP initialization gives the IS model an efficient starting point, thus enabling its fast adaptation to the new scene with scarce training data. We conduct extensive I2I translation experiments on human face and street view datasets. Quantitative results validate that our approach can significantly boost the performance of classical I2I translation models, such as PG2 and Pix2Pix. Our visualization results show both higher image quality and more appropriate instance-specific details, e.g., the translated image of a person looks more like that person in terms of identity.
P. Müller and A. Bulling
Technical Report, 2019
(arXiv: 1905.02058)
Abstract
Automatic detection of emergent leaders in small groups from nonverbal behaviour is a growing research topic in social signal processing but existing methods were evaluated on single datasets -- an unrealistic assumption for real-world applications in which systems are required to also work in settings unseen at training time. It therefore remains unclear whether current methods for emergent leadership detection generalise to similar but new settings and to which extent. To overcome this limitation, we are the first to study a cross-dataset evaluation setting for the emergent leadership detection task. We provide evaluations for within- and cross-dataset prediction using two current datasets (PAVIS and MPIIGroupInteraction), as well as an investigation on the robustness of commonly used feature channels (visual focus of attention, body pose, facial action units, speaking activity) and online prediction in the cross-dataset setting. Our evaluations show that using pose and eye contact based features, cross-dataset prediction is possible with an accuracy of 0.68, as such providing another important piece of the puzzle towards emergent leadership detection in the real world.
Shape Evasion: Preventing Body Shape Inference of Multi-Stage Approaches
H. Sattar, K. Krombholz, G. Pons-Moll and M. Fritz
Technical Report, 2019
(arXiv: 1905.11503)
Abstract
Modern approaches to pose and body shape estimation have recently achieved strong performance even under challenging real-world conditions. Even from a single image of a clothed person, a realistic looking body shape can be inferred that captures a users' weight group and body shape type well. This opens up a whole spectrum of applications -- in particular in fashion -- where virtual try-on and recommendation systems can make use of these new and automatized cues. However, a realistic depiction of the undressed body is regarded highly private and therefore might not be consented by most people. Hence, we ask if the automatic extraction of such information can be effectively evaded. While adversarial perturbations have been shown to be effective for manipulating the output of machine learning models -- in particular, end-to-end deep learning approaches -- state of the art shape estimation methods are composed of multiple stages. We perform the first investigation of different strategies that can be used to effectively manipulate the automatic shape estimation while preserving the overall appearance of the original image.
Learning GAN fingerprints towards Image Attribution
N. Yu, L. Davis and M. Fritz
Technical Report, 2019
(arXiv: 1811.08180)
Abstract
Recent advances in Generative Adversarial Networks (GANs) have shown increasing success in generating photorealistic images. But they also raise challenges to visual forensics and model authentication. We present the first study of learning GAN fingerprints towards image attribution: we systematically investigate the performance of classifying an image as real or GAN-generated. For GAN-generated images, we further identify their sources. Our experiments validate that GANs carry distinct model fingerprints and leave stable fingerprints to their generated images, which support image attribution. Even a single difference in GAN training initialization can result in different fingerprints, which enables fine-grained model authentication. We further validate such a fingerprint is omnipresent in different image components and is not biased by GAN artifacts. Fingerprint finetuning is effective in immunizing five types of adversarial image perturbations. Comparisons also show our learned fingerprints consistently outperform several baselines in a variety of setups.
2018
NRST: Non-rigid Surface Tracking from Monocular Video
M. Habermann, W. Xu, H. Rohdin, M. Zollhöfer, G. Pons-Moll and C. Theobalt
Pattern Recognition (GCPR 2018), 2018