Video Based Reconstruction of 3D People Models
T. Alldieck, M. A. Magnor, W. Xu, C. Theobalt and G. Pons-Moll
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
PoseTrack: A Benchmark for Human Pose Estimation and Tracking
M. Andriluka, U. Iqbal, A. Milan, E. Insafutdinov, L. Pishchulin, J. Gall and B. Schiele
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Accurate and Diverse Sampling of Sequences based on a “Best of Many” Sample Objective
A. Bhattacharyya, M. Fritz and B. Schiele
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty
A. Bhattacharyya, M. Fritz and B. Schiele
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs
E. Laude, J.-H. Lange, J. Schüpfer, C. Domokos, L. Leal-Taixé, F. R. Schmidt, B. Andres and D. Cremers
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Disentangled Person Image Generation
L. Ma, Q. Sun, S. Georgoulis, L. Van Gool, B. Schiele and M. Fritz
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Connecting Pixels to Privacy and Utility: Automatic Redaction of Private Information in Images
T. Orekondy, M. Fritz and B. Schiele
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
D. H. Park, L. A. Hendricks, Z. Akata, A. Rohrbach, B. Schiele, T. Darrell and M. Rohrbach
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Natural and Effective Obfuscation by Head Inpainting
Q. Sun, L. Ma, S. J. Oh, L. Van Gool, B. Schiele and M. Fritz
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Feature Generating Networks for Zero-Shot Learning
Y. Xian, T. Lorenz, B. Schiele and Z. Akata
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
DoubleFusion: Real-time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor
T. Yu, Z. Zheng, K. Guo, J. Zhao, Q. Dai, H. Li, G. Pons-Moll and Y. Liu
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Occluded Pedestrian Detection through Guided Attention in CNNs
S. Zhang, J. Yang and B. Schiele
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Detailed Human Avatars from Monocular Video
T. Alldieck, M. A. Magnor, W. Xu, C. Theobalt and G. Pons-Moll
3DV 2018 , International Conference on 3D Vision, 2018
Single-Shot Multi-person 3D Pose Estimation from Monocular RGB
D. Mehta, O. Sotnychenko, F. Mueller, W. Xu, S. Sridhar, G. Pons-Moll and C. Theobalt
3DV 2018 , International Conference on 3D Vision, 2018
Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation
M. Omran, C. Lassner,, G. Pons-Moll, P. Gehler and B. Schiele
3DV 2018 , International Conference on 3D Vision, 2018
Video Object Segmentation with Language Referring Expressions
A. Khoreva, A. Rohrbach and B. Schiele
ACCV 2018, 14th Asian Conference on Computer Vision, 2018
(Accepted/in press)
Quick Bootstrapping of a Personalized Gaze Model from Real-Use Interactions
M. X. Huang, J. Li, G. Ngai and H. Va Leong
ACM Transactions on Intelligent Systems and Technology, Volume 9, Number 4, 2018
VRPursuits: Interaction in Virtual Reality using Smooth Pursuit Eye Movements
M. Khamis, C. Oechsner, F. Alt and A. Bulling
AVI 2018, International Conference on Advanced Visual Interfaces, 2018
JAMI: Fast Computation of Conditional Mutual Information for ceRNA Network Analysis
A. Horňáková, M. List, J. Vreeken and M. H. Schulz
Bioinformatics, Volume 34, Number 17, 2018
Understanding Face and Eye Visibility in Front-Facing Cameras of Smartphones used in the Wild
M. Khamis, A. Baier, N. Henze, F. Alt and A. Bulling
CHI 2018, CHI Conference on Human Factors in Computing Systems, 2018
Which one is me? Identifying Oneself on Public Displays
M. Khamis, C. Becker, A. Bulling and F. Alt
CHI 2018, CHI Conference on Human Factors in Computing Systems, 2018
Training Person-Specific Gaze Estimators from Interactions with Multiple Devices
X. Zhang, M. X. Huang, Y. Sugano and A. Bulling
CHI 2018, CHI Conference on Human Factors in Computing Systems, 2018
GazeDirector: Fully Articulated Eye Gaze Redirection in Video
E. Wood, T. Baltrusaitis, L.-P. Morency, P. Robinson and A. Bulling
Computer Graphics Forum (Proc. EUROGRAPHICS 2018), Volume 37, Number 2, 2018
Diverse Conditional Image Generation by Stochastic Regression with Latent Drop-Out Codes
Y. He, B. Schiele and M. Fritz
Computer Vision -- ECCV 2018, 2018
A Hybrid Model for Identity Obfuscation by Face Replacement
Q. Sun, A. Tewari, W. Xu, M. Fritz, C. Theobalt and B. Schiele
Computer Vision -- ECCV 2018, 2018
Recovering Accurate {3D} Human Pose in the Wild Using {IMUs} and a Moving Camera
T. von Marcard, R. Henschel, M. J. Black, B. Rosenhahn and G. Pons-Moll
Computer Vision -- ECCV 2018, 2018
GazeDrone: Mobile Eye-Based Interaction in Public Space Without Augmenting the User
M. Khamis, A. Kienle, F. Alt and A. Bulling
DroNet’18, 4th ACM Workshop on Micro Aerial Vehicle Networks, Systems, and Applications, 2018
Textual Explanations for Self-Driving Vehicles
J. Kim, A. Rohrbach, T. Darrell, J. Canny and Z. Akata
ECCV 2018, European Conference on Computer Vision, 2018
Deep neural perception and control networks have become key com- ponents of self-driving vehicles. User acceptance is likely to benefit from easy- to-interpret textual explanations which allow end-users to understand what trig- gered a particular behavior. Explanations may be triggered by the neural con- troller, namely introspective explanations , or informed by the neural controller’s output, namely rationalizations . We propose a new approach to introspective ex- planations which consists of two parts. First, we use a visual (spatial) attention model to train a convolutional network end-to-end from images to the vehicle control commands, i . e ., acceleration and change of course. The controller’s at- tention identifies image regions that potentially influence the network’s output. Second, we use an attention-based video-to-text model to produce textual ex- planations of model actions. The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. Finally, we explore a version of our model that generates rationalizations, and compare with introspective explanations on the same video segments. We evaluate these models on a novel driving dataset with ground-truth human explanations, the Berkeley DeepDrive eXplanation (BDD- X) dataset. Code is available at https://github.com/JinkyuKimUCB/explainable-deep-driving
Grounding Visual Explanations
L. A. Hendricks, R. Hu, T. Darrell and Z. Akata
ECCV 2018, 2018
(Accepted/in press)
A Vision-grounded Dataset for Predicting Typical Locations for Verbs
N. Mukuze, A. Rohrbach, V. Demberg and B. Schiele
Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018
Eye Movements During Everyday Behavior Predict Personality Traits
S. Hoppe, T. Loetscher, S. Morey and A. Bulling
Frontiers in Human Neuroscience, Volume 12, 2018
Learning to Refine Human Pose Estimation
M. Fieraru, A. Khoreva, L. Pishchulin and B. Schiele
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018
Image and Video Captioning with Augmented Neural Architectures
R. Shetty, H. R. Tavakoli and J. Laaksonen
IEEE MultiMedia, Volume 25, Number 2, 2018
Fast-PADMA: Rapidly Adapting Facial Affect Model from Similar Individuals
M. X. Huang, J. Li, G. Ngai, H. V. Leong and K. A. Hua
IEEE Transactions on Multimedia, Volume 20, Number 7, 2018
Reflectance and Natural Illumination from Single-Material Specular Objects Using Deep Learning
S. Georgoulis, K. Rematas, T. Ritschel, E. Gavves, M. Fritz, L. Van Gool and T. Tuytelaars
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 40, Number 8, 2018
Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification
M. Lapin, M. Hein and B. Schiele
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 40, Number 7, 2018
Discriminatively Trained Latent Ordinal Model for Video Classification
K. Sikka and G. Sharma
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 40, Number 8, 2018
Zero-shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly
Y. Xian, C. H. Lampert, B. Schiele and Z. Akata
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018
(Accepted/in press)
Due to the importance of zero-shot learning, i.e. classifying images where there is a lack of labeled training data, the number of proposed approaches has recently increased steadily. We argue that it is time to take a step back and to analyze the status quo of the area. The purpose of this paper is three-fold. First, given the fact that there is no agreed upon zero-shot learning benchmark, we first define a new benchmark by unifying both the evaluation protocols and data splits of publicly available datasets used for this task. This is an important contribution as published results are often not comparable and sometimes even flawed due to, e.g. pre-training on zero-shot test classes. Moreover, we propose a new zero-shot learning dataset, the Animals with Attributes 2 (AWA2) dataset which we make publicly available both in terms of image features and the images themselves. Second, we compare and analyze a significant number of the state-of-the-art methods in depth, both in the classic zero-shot setting but also in the more realistic generalized zero-shot setting. Finally, we discuss in detail the limitations of the current status of the area which can be taken as a basis for advancing it.
Towards Reaching Human Performance in Pedestrian Detection
S. Zhang, R. Benenson, M. Omran, J. Hosang and B. Schiele
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 40, Number 4, 2018
Encouraged by the recent progress in pedestrian detection, we investigate the gap between current state-of-the-art methods and the “perfect single frame detector”. We enable our analysis by creating a human baseline for pedestrian detection (over the Caltech pedestrian dataset). After manually clustering the frequent errors of a top detector, we characterise both localisation and background- versus-foreground errors. To address localisation errors we study the impact of training annotation noise on the detector performance, and show that we can improve results even with a small portion of sanitised training data. To address background/foreground discrimination, we study convnets for pedestrian detection, and discuss which factors affect their performance. Other than our in-depth analysis, we report top performance on the Caltech pedestrian dataset, and provide a new sanitised set of training and test annotations.
MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation
X. Zhang, Y. Sugano, M. Fritz and A. Bulling
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume Early Access, 2018
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori and L. Fei-Fei
International Journal of Computer Vision, Volume 126, Number 2-4, 2018
Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behaviour
P. Müller, M. X. Huang and A. Bulling
IUI 2018, 23rd International Conference on Intelligent User Interfaces, 2018
Explainable AI: The New 42?
R. Goebel, A. Chander, K. Holzinger, F. Lecue, Z. Akata, S. Stumpf, P. Kieseberg and A. Holzinger
Machine Learning and Knowledge Extraction (CD-MAKE 2018), 2018
Tracing Cell Lineages in Videos of Lens-free Microscopy
M. Rempfler, V. Stierle, K. Ditzel, S. Kumar, P. Paulitschke, B. Andres and B. H. Menze
Medical Image Analysis, Volume 48, 2018
The Past, Present, and Future of Gaze-enabled Handheld Mobile Devices: Survey and Lessons Learned
M. Khamis, F. Alt and A. Bulling
MobileHCI 2018, 20th International Conference on Human-Computer Interaction with Mobile Devices and Services, 2018
Forecasting User Attention During Everyday Mobile Interactions Using Device-Integrated and Wearable Sensors
J. Steil, P. Müller, Y. Sugano and A. Bulling
MobileHCI 2018, 20th International Conference on Human-Computer Interaction with Mobile Devices and Services, 2018
Error-Aware Gaze-Based Interfaces for Robust Mobile Gaze Interaction
M. Barz, F. Daiber, D. Sonntag and A. Bulling
Proceedings ETRA 2018, 2018
Hidden Pursuits: Evaluating Gaze-selection via Pursuits when the Stimulus Trajectory is Partially Hidden
T. Mattusch, M. Mirzamohammad, M. Khamis, A. Bulling and F. Alt
Proceedings ETRA 2018, 2018
Robust Eye Contact Detection in Natural Multi-Person Interactions Using Gaze and Speaking Behaviour
P. Müller, M. X. Huang, X. Zhang and A. Bulling
Proceedings ETRA 2018, 2018
Learning to Find Eye Region Landmarks for Remote Gaze Estimation in Unconstrained Settings
S. Park, X. Zhang, A. Bulling and O. Hilliges
Proceedings ETRA 2018, 2018
Fixation Detection for Head-Mounted Eye Tracking Based on Visual Similarity of Gaze Targets
J. Steil, M. X. Huang and A. Bulling
Proceedings ETRA 2018, 2018
Revisiting Data Normalization for Appearance-Based Gaze Estimation
X. Zhang, Y. Sugano and A. Bulling
Proceedings ETRA 2018, 2018
A4NT: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation
R. Shetty, B. Schiele and M. Fritz
Proceedings of the 27th USENIX Security Symposium, 2018
Partial Optimality and Fast Lower Bounds for Weighted Correlation Clustering
J.-H. Lange, A. Karrenbauer and B. Andres
Proceedings of the 35th International Conference on Machine Learning (ICML 2018), 2018
A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks
A. Khan, I. Steiner, Y. Sugano, A. Bulling and R. Macdonald
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018
Generating Counterfactual Explanations with Natural Language
L. A. Hendricks, R. Hu, T. Darrell and Z. Akata
Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), 2018
(arXiv: 1806.09809)
Natural language explanations of deep neural network decisions provide an intuitive way for a AI agent to articulate a reasoning process. Current textual explanations learn to discuss class discriminative features in an image. However, it is also helpful to understand which attributes might change a classification decision if present in an image (e.g., "This is not a Scarlet Tanager because it does not have black wings.") We call such textual explanations counterfactual explanations, and propose an intuitive method to generate counterfactual explanations by inspecting which evidence in an input is missing, but might contribute to a different classification decision if present in the image. To demonstrate our method we consider a fine-grained image classification task in which we take as input an image and a counterfactual class and output text which explains why the image does not belong to a counterfactual class. We then analyze our generated counterfactual explanations both qualitatively and quantitatively using proposed automatic metrics.
Advanced Steel Microstructure Classification by Deep Learning Methods
S. M. Azimi, D. Britz, M. Engstler, M. Fritz and F. Mücklich
Scientific Reports, Volume 8, 2018
The inner structure of a material is called microstructure. It stores the genesis of a material and determines all its physical and chemical properties. While microstructural characterization is widely spread and well known, the microstructural classification is mostly done manually by human experts, which opens doors for huge uncertainties. Since the microstructure could be a combination of different phases with complex substructures its automatic classification is very challenging and just a little work in this field has been carried out. Prior related works apply mostly designed and engineered features by experts and classify microstructure separately from feature extraction step. Recently Deep Learning methods have shown surprisingly good performance in vision applications by learning the features from data together with the classification step. In this work, we propose a deep learning method for microstructure classification in the examples of certain microstructural constituents of low carbon steel. This novel method employs pixel-wise segmentation via Fully Convolutional Neural Networks (FCNN) accompanied by max-voting scheme. Our system achieves 93.94% classification accuracy, drastically outperforming the state-of-the-art method of 48.89% accuracy, indicating the effectiveness of pixel-wise approaches. Beyond the success presented in this paper, this line of research offers a more robust and first of all objective way for the difficult task of steel quality appreciation.
Towards Reverse-Engineering Black-Box Neural Networks
S. J. Oh, M. Augustin, B. Schiele and M. Fritz
Sixth International Conference on Learning Representations (ICLR 2018), 2018
(Accepted/in press)
Long-Term Image Boundary Prediction
A. Bhattacharyya, M. Malinowski, B. Schiele and M. Fritz
Thirty-Second AAAI Conference on Artificial Intelligence, 2018
Bayesian Prediction of Future Street Scenes through Importance Sampling based Optimization
A. Bhattacharyya, M. Fritz and B. Schiele
Technical Report, 2018
(arXiv: 1806.06939)
For autonomous agents to successfully operate in the real world, anticipation of future events and states of their environment is a key competence. This problem can be formalized as a sequence prediction problem, where a number of observations are used to predict the sequence into the future. However, real-world scenarios demand a model of uncertainty of such predictions, as future states become increasingly uncertain and multi-modal -- in particular on long time horizons. This makes modelling and learning challenging. We cast state of the art semantic segmentation and future prediction models based on deep learning into a Bayesian formulation that in turn allows for a full Bayesian treatment of the prediction problem. We present a new sampling scheme for this model that draws from the success of variational autoencoders by incorporating a recognition network. In the experiments we show that our model outperforms prior work in accuracy of the predicted segmentation and provides calibrated probabilities that also better capture the multi-modal aspects of possible future states of street scenes.
Proceedings PETMEI 2018
A. Bulling, E. Kasneci and C. Lander (Eds.)
ACM, 2018
Primal-Dual Wasserstein GAN
M. Gemici, Z. Akata and M. Welling
Technical Report, 2018
(arXiv: 1805.09575)
We introduce Primal-Dual Wasserstein GAN, a new learning algorithm for building latent variable models of the data distribution based on the primal and the dual formulations of the optimal transport (OT) problem. We utilize the primal formulation to learn a flexible inference mechanism and to create an optimal approximate coupling between the data distribution and the generative model. In order to learn the generative model, we use the dual formulation and train the decoder adversarially through a critic network that is regularized by the approximate coupling obtained from the primal. Unlike previous methods that violate various properties of the optimal critic, we regularize the norm and the direction of the gradients of the critic function. Our model shares many of the desirable properties of auto-encoding models in terms of mode coverage and latent structure, while avoiding their undesirable averaging properties, e.g. their inability to capture sharp visual features when modeling real images. We compare our algorithm with several other generative modeling techniques that utilize Wasserstein distances on Frechet Inception Distance (FID) and Inception Scores (IS).
MLCapsule: Guarded Offline Deployment of Machine Learning as a Service
L. Hanzlik, Y. Zhang, K. Grosse, A. Salem, M. Augustin, M. Backes and M. Fritz
Technical Report, 2018
(arXiv: 1808.00590)
With the widespread use of machine learning (ML) techniques, ML as a service has become increasingly popular. In this setting, an ML model resides on a server and users can query the model with their data via an API. However, if the user's input is sensitive, sending it to the server is not an option. Equally, the service provider does not want to share the model by sending it to the client for protecting its intellectual property and pay-per-query business model. In this paper, we propose MLCapsule, a guarded offline deployment of machine learning as a service. MLCapsule executes the machine learning model locally on the user's client and therefore the data never leaves the client. Meanwhile, MLCapsule offers the service provider the same level of control and security of its model as the commonly used server-side execution. In addition, MLCapsule is applicable to offline applications that require local execution. Beyond protecting against direct model access, we demonstrate that MLCapsule allows for implementing defenses against advanced attacks on machine learning models such as model stealing/reverse engineering and membership inference.
Manipulating Attributes of Natural Scenes via Hallucination
L. Karacan, Z. Akata, A. Erdem and E. Erdem
Technical Report, 2018
(arXiv: 1808.07413)
In this study, we explore building a two-stage framework for enabling users to directly manipulate high-level attributes of a natural scene. The key to our approach is a deep generative network which can hallucinate images of a scene as if they were taken at a different season (e.g. during winter), weather condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the scene is hallucinated with the given attributes, the corresponding look is then transferred to the input image while preserving the semantic details intact, giving a photo-realistic manipulation result. As the proposed framework hallucinates what the scene will look like, it does not require any reference style image as commonly utilized in most of the appearance or style transfer approaches. Moreover, it allows to simultaneously manipulate a given scene according to a diverse set of transient attributes within a single model, eliminating the need of training multiple networks per each translation task. Our comprehensive set of qualitative and quantitative results demonstrate the effectiveness of our approach against the competing methods.
From Perception over Anticipation to Manipulation
W. Li
PhD Thesis, Universität des Saarlandes, 2018
From autonomous driving cars to surgical robots, robotic system has enjoyed significant growth over the past decade. With the rapid development in robotics alongside the evolution in the related fields, such as computer vision and machine learning, integrating perception, anticipation and manipulation is key to the success of future robotic system. In this thesis, we explore different ways of such integration to extend the capabilities of a robotic system to take on more challenging real world tasks. On anticipation and perception, we address the recognition of ongoing activity from videos. In particular we focus on long-duration and complex activities and hence propose a new challenging dataset to facilitate the work. We introduce hierarchical labels over the activity classes and investigate the temporal accuracy-specificity trade-offs. We propose a new method based on recurrent neural networks that learns to predict over this hierarchy and realize accuracy specificity trade-offs. Our method outperforms several baselines on this new challenge. On manipulation with perception, we propose an efficient framework for programming a robot to use human tools. We first present a novel and compact model for using tools described by a tip model. Then we explore a strategy of utilizing a dual-gripper approach for manipulating tools – motivated by the absence of dexterous hands on widely available general purpose robots. Afterwards, we embed the tool use learning into a hierarchical architecture and evaluate it on a Baxter research robot. Finally, combining perception, anticipation and manipulation, we focus on a block stacking task. First we explore how to guide robot to place a single block into the scene without collapsing the existing structure. We introduce a mechanism to predict physical stability directly from visual input and evaluate it first on a synthetic data and then on real-world block stacking. Further, we introduce the target stacking task where the agent stacks blocks to reproduce a tower shown in an image. To do so, we create a synthetic block stacking environment with physics simulation in which the agent can learn block stacking end-to-end through trial and error, bypassing to explicitly model the corresponding physics knowledge. We propose a goal-parametrized GDQN model to plan with respect to the specific goal. We validate the model on both a navigation task in a classic gridworld environment and the block stacking task.
Deep Appearance Maps
M. Maximov, T. Ritschel and M. Fritz
Technical Report, 2018
(arXiv: 1804.00863)
We propose a deep representation of appearance, i. e. the relation of color, surface orientation, viewer position, material and illumination. Previous approaches have used deep learning to extract classic appearance representations relating to reflectance model parameters (e. g. Phong) or illumination (e. g. HDR environment maps). We suggest to directly represent appearance itself as a network we call a deep appearance map (DAM). This is a 4D generalization over 2D reflectance maps, which held the view direction fixed. First, we show how a DAM can be learned from images or video frames and later be used to synthesize appearance, given new surface orientations and viewer positions. Second, we demonstrate how another network can be used to map from an image or video frames to a DAM network to reproduce this appearance, without using a lengthy optimization such as stochastic gradient descent (learning-to-learn). Finally, we generalize this to an appearance estimation-and-segmentation task, where we map from an image showing multiple materials to multiple networks reproducing their appearance, as well as per-pixel segmentation.
Computational Modelling of Visual Attention during Reading
A. Nurkas
PhD Thesis, Universität des Saarlandes, 2018
Image Manipulation against Learned Models Privacy and Security Implications
S. J. Oh
PhD Thesis, Universität des Saarlandes, 2018
Machine learning is transforming the world. Its application areas span privacy sensitive and security critical tasks such as human identification and self-driving cars. These applications raise privacy and security related questions that are not fully understood or answered yet: Can automatic person recognisers identify people in photos even when their faces are blurred? How easy is it to find an adversarial input for a self-driving car that makes it drive off the road? This thesis contributes one of the first steps towards a better understanding of such concerns. We observe that many privacy and security critical scenarios for learned models involve input data manipulation: users obfuscate their identity by blurring their faces and adversaries inject imperceptible perturbations to the input signal. We introduce a data manipulator framework as a tool for collectively describing and analysing privacy and security relevant scenarios involving learned models. A data manipulator introduces a shift in data distribution for achieving privacy or security related goals, and feeds the transformed input to the target model. This framework provides a common perspective on the studies presented in the thesis. We begin the studies from the user’s privacy point of view. We analyse the efficacy of common obfuscation methods like face blurring, and show that they are surprisingly ineffective against state of the art person recognition systems. We then propose alternatives based on head inpainting and adversarial examples. By studying the user privacy, we also study the dual problem: model security. In model security perspective, a model ought to be robust and reliable against small amounts of data manipulation. In both cases, data are manipulated with the goal of changing the target model prediction. User privacy and model security problems can be described with the same objective. We then study the knowledge aspect of the data manipulation problem. The more one knows about the target model, the more effective manipulations one can craft. We propose a game theoretic manipulation framework to systematically represent the knowledge level on the target model and derive privacy and security guarantees. We then discuss ways to increase knowledge about a black-box model by only querying it, deriving implications that are relevant to both privacy and security perspectives.
Understanding and Controlling User Linkability in Decentralized Learning
T. Orekondy, S. J. Oh, B. Schiele and M. Fritz
Technical Report, 2018
(arXiv: 1805.05838)
Machine Learning techniques are widely used by online services (e.g. Google, Apple) in order to analyze and make predictions on user data. As many of the provided services are user-centric (e.g. personal photo collections, speech recognition, personal assistance), user data generated on personal devices is key to provide the service. In order to protect the data and the privacy of the user, federated learning techniques have been proposed where the data never leaves the user's device and "only" model updates are communicated back to the server. In our work, we propose a new threat model that is not concerned with learning about the content - but rather is concerned with the linkability of users during such decentralized learning scenarios. We show that model updates are characteristic for users and therefore lend themselves to linkability attacks. We show identification and matching of users across devices in closed and open world scenarios. In our experiments, we find our attacks to be highly effective, achieving 20x-175x chance-level performance. In order to mitigate the risks of linkability attacks, we study various strategies. As adding random noise does not offer convincing operation points, we propose strategies based on using calibrated domain-specific data; we find these strategies offers substantial protection against linkability threats with little effect to utility.
Fashion is Taking Shape: Understanding Clothing Preference Based on Body Shape From Online Sources
H. Sattar, G. Pons-Moll and M. Fritz
Technical Report, 2018
(arXiv: 1807.03235)
To study the correlation between clothing garments and body shape, we collected a new dataset (Fashion Takes Shape), which includes images of users with clothing category annotations. We employ our multi-photo approach to estimate body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated and show that our multi-photo approach leads to a better predictive model for clothing categories compared to models based on single-view shape estimates or manually annotated body types. We see our method as the first step towards the large-scale understanding of clothing preferences from body shape.
Adversarial Scene Editing: Automatic Object Removal from Weak Supervision
R. Shetty, M. Fritz and B. Schiele
Technical Report, 2018
(arXiv: 1806.01911)
While great progress has been made recently in automatic image manipulation, it has been limited to object centric images like faces or structured scene datasets. In this work, we take a step towards general scene-level image editing by developing an automatic interaction-free object removal model. Our model learns to find and remove objects from general scene images using image-level labels and unpaired data in a generative adversarial network (GAN) framework. We achieve this with two key contributions: a two-stage editor architecture consisting of a mask generator and image in-painter that co-operate to remove objects, and a novel GAN based prior for the mask generator that allows us to flexibly incorporate knowledge about object shapes. We experimentally show on two datasets that our method effectively removes a wide variety of objects using weak supervision only
PrivacEye: Privacy-Preserving First-Person Vision Using Image Features and Eye Movement Analysis
J. Steil, M. Koelle, W. Heuten, S. Boll and A. Bulling
Technical Report, 2018
(arXiv: 1801.04457)
As first-person cameras in head-mounted displays become increasingly prevalent, so does the problem of infringing user and bystander privacy. To address this challenge, we present PrivacEye, a proof-of-concept system that detects privacysensitive everyday situations and automatically enables and disables the first-person camera using a mechanical shutter. To close the shutter, PrivacEye detects sensitive situations from first-person camera videos using an end-to-end deep-learning model. To open the shutter without visual input, PrivacEye uses a separate, smaller eye camera to detect changes in users' eye movements to gauge changes in the "privacy level" of the current situation. We evaluate PrivacEye on a dataset of first-person videos recorded in the daily life of 17 participants that they annotated with privacy sensitivity levels. We discuss the strengths and weaknesses of our proof-of-concept system based on a quantitative technical evaluation as well as qualitative insights from semi-structured interviews.
Sequential Attacks on Agents for Long-Term Adversarial Goals
E. Tretschk, S. J. Oh and M. Fritz
Technical Report, 2018
(arXiv: 1805.12487)
Reinforcement learning (RL) has advanced greatly in the past few years with the employment of effective deep neural networks (DNNs) on the policy networks. With the great effectiveness came serious vulnerability issues with DNNs that small adversarial perturbations on the input can change the output of the network. Several works have pointed out that learned agents with a DNN policy network can be manipulated against achieving the original task through a sequence of small perturbations on the input states. In this paper, we demonstrate furthermore that it is also possible to impose an arbitrary adversarial reward on the victim policy network through a sequence of attacks. Our method involves the latest adversarial attack technique, Adversarial Transformer Network (ATN), that learns to generate the attack and is easy to integrate into the policy network. As a result of our attack, the victim agent is misguided to optimise for the adversarial reward over time. Our results expose serious security threats for RL applications in safety-critical systems including drones, medical analysis, and self-driving cars.
Gaze Estimation and Interaction in Real-World Environments
X. Zhang
PhD Thesis, Universität des Saarlandes, 2018