2019
Learning to Reconstruct People in Clothing from a Single RGB Camera
T. Alldieck, M. A. Magnor, B. L. Bhatnagar, C. Theobalt and G. Pons-Moll
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval
A. Dutta and Z. Akata
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
In the Wild Human Pose Estimation using Explicit 2D Features and Intermediate 3D Representations
I. Habibie, W. Xu, D. Mehta, G. Pons-Moll and C. Theobalt
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Time-Conditioned Action Anticipation in One Shot
Q. Ke, M. Fritz and B. Schiele
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Combinatorial Persistency Criteria for Multicut and Max-Cut
J.-H. Lange, B. Andres and P. Swoboda
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Knockoff Nets: Stealing Functionality of Black-Box Models
T. Orekondy, B. Schiele and M. Fritz
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
E. Schönfeld, S. Ebrahimi, S. Sinha, T. Darrell and Z. Akata
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Not Using the Car to See the Sidewalk: Quantifying and Controlling the Effects of Context in Classification and Segmentation
R. Shetty, B. Schiele and M. Fritz
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
D. Stutz, M. Hein, and B. Schiele
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Meta-Transfer Learning for Few-Shot Learning
Q. Sun, Y. Liu, T.-S. Chua and B. Schiele
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning
Y. Xian, S. Sharma, B. Schiele and Z. Akata
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Abstract
When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes. To learn the class conditional distribution of CNN features, these models rely on pairs of image features and class attributes. Hence, they can not make use of the abundance of unlabeled data samples. In this paper, we tackle any-shot learning problems i.e. zero-shot and few-shot, in a unified feature generating framework that operates in both inductive and transductive learning settings. We develop a conditional generative model that combines the strength of VAE and GANs and in addition, via an unconditional discriminator, learns the marginal feature distribution of unlabeled images. We empirically show that our model learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e. inductive and transductive (generalized) zero- and few-shot learning settings. We also demonstrate that our learned features are interpretable: we visualize them by inverting them back to the pixel space and we explain them by generating textual arguments of why they are associated with a certain label.
SPNet: Semantic Projection Network for Zero- and Few-Label Semantic Segmentation
Y. Xian, S. Choudhury, Y. He, B. Schiele and Z. Akata
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture
N. Yu, C. Barnes, E. Shechtman, S. Amirghodsi and M. Lukáč
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
SimulCap : Single-View Human Performance Capture with Cloth Simulation
T. Yu, Z. Zheng, Y. Zhong, J. Zhao, D. Quionhai, G. Pons-Moll and Y. Liu
32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), 2019
(Accepted/in press)
LiveCap: Real-time Human Performance Capture from Monocular Video
M. Habermann, W. Xu, M. Zollhöfer, G. Pons-Moll and C. Theobalt
ACM Transactions on Graphics, Volume 38, Number 2, 2019
Everyday Eye Tracking for Real-World Consumer Behavior Analysis
A. Bulling and M. Wedel
A Handbook of Process Tracing Methods for Decision Research, 2019
Evaluation of Appearance-Based Methods and Implications for Gaze-Based Applications
X. Zhang, Y. Sugano and A. Bulling
CHI 2019, CHI Conference on Human Factors in Computing Systems, 2019
XNect Demo (v2): Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera
D. Mehta, O. Sotnychenko, F. Mueller, W. Xu, H.-P. Seidel, P. Fua, M. Elgharib, H. Rhodin, G. Pons-Moll and C. Theobalt
CVPR 2019 Demonstrations, 2019
Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking
S. Sharma, P. T. Varigonda, P. Bindal, A. Sharma and A. Jain
ICCV 2019, International Conference on Computer Vision, 2019
(Accepted/in press)
Abstract
Monocular 3D Human Pose Estimation from static images is a challenging problem, due to the curse of dimensionality and the ill-posed nature of lifting 2D to 3D. In this paper, we propose a Deep Conditional Variational Autoencoder based model that synthesizes diverse 3D pose samples conditioned on the estimated 2D pose. Our experiments reveal that the CVAE generates significantly diverse 3D samples that are consistent with the 2D pose, thereby reducing the ambiguity in lifting from 2D-to-3D. We use two strategies for predicting the final 3D pose - (a) depth-ordering/ordinal relations to score and aggregate the final 3D pose, or OrdinalScore, and (b) with supervision from an Oracle. We report close to state of the art results on two benchmark datasets using OrdinalScore, and state-of-the-art results using the Oracle. We also show our pipeline gives competitive results without paired 3D supervision. We shall make the training and evaluation code available at https://github.com/ssfootball04/generative_pose.
MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation
X. Zhang, Y. Sugano, M. Fritz and A. Bulling
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 41, Number 1, 2019
Fashion is Taking Shape: Understanding Clothing Preference Based on Body Shape From Online Sources
H. Sattar, G. Pons-Moll and M. Fritz
2019 IEEE Winter Conference on Applications of Computer Vision (WACV 2019), 2019
Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods
A. Bhattacharyya, M. Fritz and B. Schiele
International Conference on Learning Representations (ICLR 2019), 2019
(Accepted/in press)
P. Müller and A. Bulling
2019 International Conference on Multimodal Interaction (ICMI 2019), 2019
(Accepted/in press)
Abstract
Automatic detection of emergent leaders in small groups from nonverbal behaviour is a growing research topic in social signal processing but existing methods were evaluated on single datasets -- an unrealistic assumption for real-world applications in which systems are required to also work in settings unseen at training time. It therefore remains unclear whether current methods for emergent leadership detection generalise to similar but new settings and to which extent. To overcome this limitation, we are the first to study a cross-dataset evaluation setting for the emergent leadership detection task. We provide evaluations for within- and cross-dataset prediction using two current datasets (PAVIS and MPIIGroupInteraction), as well as an investigation on the robustness of commonly used feature channels (visual focus of attention, body pose, facial action units, speaking activity) and online prediction in the cross-dataset setting. Our evaluations show that using pose and eye contact based features, cross-dataset prediction is possible with an accuracy of 0.68, as such providing another important piece of the puzzle towards emergent leadership detection in the real world.
Reducing Calibration Drift in Mobile Eye Trackers by Exploiting Mobile Phone Usage
P. Müller, D. Buschek, M. X. Huang and A. Bulling
Proceedings ETRA 2019, 2019
PrivacEye: Privacy-Preserving Head-Mounted Eye Tracking Using Egocentric Scene Image and Eye Movement Features
J. Steil, M. Koelle, W. Heuten, S. Boll and A. Bulling
Proceedings ETRA 2019, 2019
Privacy-Aware Eye Tracking Using Differential Privacy
J. Steil, I. Hagestedt, M. X. Huang and A. Bulling
Proceedings ETRA 2019, 2019
Bottleneck Potentials in Markov Random Fields
A. Abbas and P. Swoboda
Technical Report, 2019
(arXiv: 1904.08080)
Abstract
We consider general discrete Markov Random Fields(MRFs) with additional bottleneck potentials which penalize the maximum (instead of the sum) over local potential value taken by the MRF-assignment. Bottleneck potentials or analogous constructions have been considered in (i) combinatorial optimization (e.g. bottleneck shortest path problem, the minimum bottleneck spanning tree problem, bottleneck function minimization in greedoids), (ii) inverse problems with $L_{\infty}$-norm regularization, and (iii) valued constraint satisfaction on the $(\min,\max)$-pre-semirings. Bottleneck potentials for general discrete MRFs are a natural generalization of the above direction of modeling work to Maximum-A-Posteriori (MAP) inference in MRFs. To this end, we propose MRFs whose objective consists of two parts: terms that factorize according to (i) $(\min,+)$, i.e. potentials as in plain MRFs, and (ii) $(\min,\max)$, i.e. bottleneck potentials. To solve the ensuing inference problem, we propose high-quality relaxations and efficient algorithms for solving them. We empirically show efficacy of our approach on large scale seismic horizon tracking problems.
Tex2Shape: Detailed Full Human Body Geometry from a Single Image
T. Alldieck, G. Pons-Moll, C. Theobalt and M. A. Magnor
Technical Report, 2019
(arXiv: 1904.08645)
Abstract
We present a simple yet effective method to infer detailed full human body shape from only a single photograph. Our model can infer full-body shape including face, hair, and clothing including wrinkles at interactive frame-rates. Results feature details even on parts that are occluded in the input image. Our main idea is to turn shape regression into an aligned image-to-image translation problem. The input to our method is a partial texture map of the visible region obtained from off-the-shelf methods. From a partial texture, we estimate detailed normal and vector displacement maps, which can be applied to a low-resolution smooth body model to add detail and clothing. Despite being trained purely with synthetic data, our model generalizes well to real-world photographs. Numerous results demonstrate the versatility and robustness of our method.
Moment-to-Moment Detection of Internal Thought from Eye Vergence Behaviour
M. X. Huang, J. Li, G. Ngai, H. V. Leong and A. Bulling
Technical Report, 2019
(arXiv: 1901.06572)
Abstract
Internal thought refers to the process of directing attention away from a primary visual task to internal cognitive processing. Internal thought is a pervasive mental activity and closely related to primary task performance. As such, automatic detection of internal thought has significant potential for user modelling in intelligent interfaces, particularly for e-learning applications. Despite the close link between the eyes and the human mind, only a few studies have investigated vergence behaviour during internal thought and none has studied moment-to-moment detection of internal thought from gaze. While prior studies relied on long-term data analysis and required a large number of gaze characteristics, we describe a novel method that is computationally light-weight and that only requires eye vergence information that is readily available from binocular eye trackers. We further propose a novel paradigm to obtain ground truth internal thought annotations that exploits human blur perception. We evaluate our method for three increasingly challenging detection tasks: (1) during a controlled math-solving task, (2) during natural viewing of lecture videos, and (3) during daily activities, such as coding, browsing, and reading. Results from these evaluations demonstrate the performance and robustness of vergence-based detection of internal thought and, as such, open up new directions for research on interfaces that adapt to shifts of mental attention.
SacCalib: Reducing Calibration Distortion for Stationary Eye Trackers Using Saccadic Eye Movements
M. X. Huang and A. Bulling
Technical Report, 2019
(arXiv: 1903.04047)
Abstract
Recent methods to automatically calibrate stationary eye trackers were shown to effectively reduce inherent calibration distortion. However, these methods require additional information, such as mouse clicks or on-screen content. We propose the first method that only requires users' eye movements to reduce calibration distortion in the background while users naturally look at an interface. Our method exploits that calibration distortion makes straight saccade trajectories appear curved between the saccadic start and end points. We show that this curving effect is systematic and the result of distorted gaze projection plane. To mitigate calibration distortion, our method undistorts this plane by straightening saccade trajectories using image warping. We show that this approach improves over the common six-point calibration and is promising for reducing distortion. As such, it provides a non-intrusive solution to alleviating accuracy decrease of eye tracker during long-term use.
LCC: Learning to Customize and Combine Neural Networks for Few-Shot Learning
Y. Liu, Q. Sun, A.-A. Liu, Y. Su, B. Schiele and T.-S. Chua
Technical Report, 2019
(arXiv: 1904.08479)
Abstract
Meta-learning has been shown to be an effective strategy for few-shot learning. The key idea is to leverage a large number of similar few-shot tasks in order to meta-learn how to best initiate a (single) base-learner for novel few-shot tasks. While meta-learning how to initialize a base-learner has shown promising results, it is well known that hyperparameter settings such as the learning rate and the weighting of the regularization term are important to achieve best performance. We thus propose to also meta-learn these hyperparameters and in fact learn a time- and layer-varying scheme for learning a base-learner on novel tasks. Additionally, we propose to learn not only a single base-learner but an ensemble of several base-learners to obtain more robust results. While ensembles of learners have shown to improve performance in various settings, this is challenging for few-shot learning tasks due to the limited number of training samples. Therefore, our approach also aims to meta-learn how to effectively combine several base-learners. We conduct extensive experiments and report top performance for five-class few-shot recognition tasks on two challenging benchmarks: miniImageNet and Fewshot-CIFAR100 (FC100).
Learning Manipulation under Physics Constraints with Visual Perception
W. Li, A. Leonardis, J. Bohg and M. Fritz
Technical Report, 2019
(arXiv: 1904.09860)
Abstract
Understanding physical phenomena is a key competence that enables humans and animals to act and interact under uncertain perception in previously unseen environments containing novel objects and their configurations. In this work, we consider the problem of autonomous block stacking and explore solutions to learning manipulation under physics constraints with visual perception inherent to the task. Inspired by the intuitive physics in humans, we first present an end-to-end learning-based approach to predict stability directly from appearance, contrasting a more traditional model-based approach with explicit 3D representations and physical simulation. We study the model's behavior together with an accompanied human subject test. It is then integrated into a real-world robotic system to guide the placement of a single wood block into the scene without collapsing existing tower structure. To further automate the process of consecutive blocks stacking, we present an alternative approach where the model learns the physics constraint through the interaction with the environment, bypassing the dedicated physics learning as in the former part of this work. In particular, we are interested in the type of tasks that require the agent to reach a given goal state that may be different for every new trial. Thereby we propose a deep reinforcement learning framework that learns policies for stacking tasks which are parametrized by a target structure.
AMASS: Archive of Motion Capture as Surface Shapes
N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll and M. J. Black
Technical Report, 2019
(arXiv: 1904.03278)
Abstract
Large datasets are the cornerstone of recent advances in computer vision using deep learning. In contrast, existing human motion capture (mocap) datasets are small and the motions limited, hampering progress on learning models of human motion. While there are many different datasets available, they each use a different parameterization of the body, making it difficult to integrate them into a single meta dataset. To address this, we introduce AMASS, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization. We achieve this using a new method, MoSh++, that converts mocap data into realistic 3D human meshes represented by a rigged body model; here we use SMPL [doi:10.1145/2816795.2818013], which is widely used and provides a standard skeletal representation as well as a fully rigged surface mesh. The method works for arbitrary marker sets, while recovering soft-tissue dynamics and realistic hand motion. We evaluate MoSh++ and tune its hyperparameters using a new dataset of 4D body scans that are jointly recorded with marker-based mocap. The consistent representation of AMASS makes it readily useful for animation, visualization, and generating training data for deep learning. Our dataset is significantly richer than previous human motion collections, having more than 40 hours of motion data, spanning over 300 subjects, more than 11,000 motions, and will be publicly available to the research community.
A Novel BiLevel Paradigm for Image-to-Image Translation
L. Ma, Q. Sun, B. Schiele and L. Van Gool
Technical Report, 2019
(arXiv: 1904.09028)
Abstract
Image-to-image (I2I) translation is a pixel-level mapping that requires a large number of paired training data and often suffers from the problems of high diversity and strong category bias in image scenes. In order to tackle these problems, we propose a novel BiLevel (BiL) learning paradigm that alternates the learning of two models, respectively at an instance-specific (IS) and a general-purpose (GP) level. In each scene, the IS model learns to maintain the specific scene attributes. It is initialized by the GP model that learns from all the scenes to obtain the generalizable translation knowledge. This GP initialization gives the IS model an efficient starting point, thus enabling its fast adaptation to the new scene with scarce training data. We conduct extensive I2I translation experiments on human face and street view datasets. Quantitative results validate that our approach can significantly boost the performance of classical I2I translation models, such as PG2 and Pix2Pix. Our visualization results show both higher image quality and more appropriate instance-specific details, e.g., the translated image of a person looks more like that person in terms of identity.
XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera
D. Mehta, O. Sotnychenko, F. Mueller, W. Xu, M. Elgharib, P. Fua, H.-P. Seidel, H. Rhodin, G. Pons-Moll and C. Theobalt
Technical Report, 2019
(arXiv: 1907.00837)
Abstract
We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates in generic scenes and is robust to difficult occlusions both by other people and objects. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fully-connected neural network turns the possibly partial (on account of occlusion) 2D pose and 3D pose features for each subject into a complete 3D pose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that neither extracted global body positions nor joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes.
P. Müller and A. Bulling
Technical Report, 2019
(arXiv: 1905.02058)
Abstract
Automatic detection of emergent leaders in small groups from nonverbal behaviour is a growing research topic in social signal processing but existing methods were evaluated on single datasets -- an unrealistic assumption for real-world applications in which systems are required to also work in settings unseen at training time. It therefore remains unclear whether current methods for emergent leadership detection generalise to similar but new settings and to which extent. To overcome this limitation, we are the first to study a cross-dataset evaluation setting for the emergent leadership detection task. We provide evaluations for within- and cross-dataset prediction using two current datasets (PAVIS and MPIIGroupInteraction), as well as an investigation on the robustness of commonly used feature channels (visual focus of attention, body pose, facial action units, speaking activity) and online prediction in the cross-dataset setting. Our evaluations show that using pose and eye contact based features, cross-dataset prediction is possible with an accuracy of 0.68, as such providing another important piece of the puzzle towards emergent leadership detection in the real world.
Prediction Poisoning: Utility-Constrained Defenses Against Model Stealing Attacks
T. Orekondy, B. Schiele and M. Fritz
Technical Report, 2019
(arXiv: 1906.10908)
Abstract
With the advances of ML models in recent years, we are seeing an increasing number of real-world commercial applications and services e.g., autonomous vehicles, medical equipment, web APIs emerge. Recent advances in model functionality stealing attacks via black-box access (i.e., inputs in, predictions out) threaten the business model of such ML applications, which require a lot of time, money, and effort to develop. In this paper, we address the issue by studying defenses for model stealing attacks, largely motivated by a lack of effective defenses in literature. We work towards the first defense which introduces targeted perturbations to the model predictions under a utility constraint. Our approach introduces the perturbations targeted towards manipulating the training procedure of the attacker. We evaluate our approach on multiple datasets and attack scenarios across a range of utility constrains. Our results show that it is indeed possible to trade-off utility (e.g., deviation from original prediction, test accuracy) to significantly reduce effectiveness of model stealing attacks.
Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning
A. M. G. Salem, A. Bhattacharyya, M. Backes, M. Fritz and Y. Zhang
Technical Report, 2019
(arXiv: 1904.01067)
Abstract
Machine learning (ML) has progressed rapidly during the past decade and the major factor that drives such development is the unprecedented large-scale data. As data generation is a continuous process, this leads to ML service providers updating their models frequently with newly-collected data in an online learning scenario. In consequence, if an ML model is queried with the same set of data samples at two different points in time, it will provide different results. In this paper, we investigate whether the change in the output of a black-box ML model before and after being updated can leak information of the dataset used to perform the update. This constitutes a new attack surface against black-box ML models and such information leakage severely damages the intellectual property and data privacy of the ML model owner/provider. In contrast to membership inference attacks, we use an encoder-decoder formulation that allows inferring diverse information ranging from detailed characteristics to full reconstruction of the dataset. Our new attacks are facilitated by state-of-the-art deep learning techniques. In particular, we propose a hybrid generative model (BM-GAN) that is based on generative adversarial networks (GANs) but includes a reconstructive loss that allows generating accurate samples. Our experiments show effective prediction of dataset characteristics and even full reconstruction in challenging conditions.
Shape Evasion: Preventing Body Shape Inference of Multi-Stage Approaches
H. Sattar, K. Krombholz, G. Pons-Moll and M. Fritz
Technical Report, 2019
(arXiv: 1905.11503)
Abstract
Modern approaches to pose and body shape estimation have recently achieved strong performance even under challenging real-world conditions. Even from a single image of a clothed person, a realistic looking body shape can be inferred that captures a users' weight group and body shape type well. This opens up a whole spectrum of applications -- in particular in fashion -- where virtual try-on and recommendation systems can make use of these new and automatized cues. However, a realistic depiction of the undressed body is regarded highly private and therefore might not be consented by most people. Hence, we ask if the automatic extraction of such information can be effectively evaded. While adversarial perturbations have been shown to be effective for manipulating the output of machine learning models -- in particular, end-to-end deep learning approaches -- state of the art shape estimation methods are composed of multiple stages. We perform the first investigation of different strategies that can be used to effectively manipulate the automatic shape estimation while preserving the overall appearance of the original image.
Learning GAN fingerprints towards Image Attribution
N. Yu, L. Davis and M. Fritz
Technical Report, 2019
(arXiv: 1811.08180)
Abstract
Recent advances in Generative Adversarial Networks (GANs) have shown increasing success in generating photorealistic images. But they also raise challenges to visual forensics and model authentication. We present the first study of learning GAN fingerprints towards image attribution: we systematically investigate the performance of classifying an image as real or GAN-generated. For GAN-generated images, we further identify their sources. Our experiments validate that GANs carry distinct model fingerprints and leave stable fingerprints to their generated images, which support image attribution. Even a single difference in GAN training initialization can result in different fingerprints, which enables fine-grained model authentication. We further validate such a fingerprint is omnipresent in different image components and is not biased by GAN artifacts. Fingerprint finetuning is effective in immunizing five types of adversarial image perturbations. Comparisons also show our learned fingerprints consistently outperform several baselines in a variety of setups.
2018
Video Object Segmentation with Language Referring Expressions
A. Khoreva, A. Rohrbach and B. Schiele
Computer Vision - ACCV 2018, 2018
NightOwls: A Pedestrians at Night Dataset
L. Neumann, M. Karg, S. Zhang, C. Scharfenberger, E. Piegert, S. Mistr, O. Prokofyeva, R. Thiel, A. Vedaldi, A. Zisserman and B. Schiele
Computer Vision - ACCV 2018, 2018
Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions
M. Wagner, H. Basevi, R. Shetty, W. Li, M. Malinowski, M. Fritz and A. Leonardis
Computer Vision - ECCV 2018 Workshops, 2018
NRST: Non-rigid Surface Tracking from Monocular Video
M. Habermann, W. Xu, H. Rohdin, M. Zollhöfer, G. Pons-Moll and C. Theobalt
Pattern Recognition (GCPR 2018), 2018