2018
Video Based Reconstruction of 3D People Models
T. Alldieck, M. A. Magnor, W. Xu, C. Theobalt and G. Pons-Moll
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
PoseTrack: A Benchmark for Human Pose Estimation and Tracking
M. Andriluka, U. Iqbal, A. Milan, E. Insafutdinov, L. Pishchulin, J. Gall and B. Schiele
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Accurate and Diverse Sampling of Sequences based on a “Best of Many” Sample Objective
A. Bhattacharyya, M. Fritz and B. Schiele
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty
A. Bhattacharyya, M. Fritz and B. Schiele
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Disentangled Person Image Generation
L. Ma, Q. Sun, S. Georgoulis, L. Van Gool, B. Schiele and M. Fritz
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Connecting Pixels to Privacy and Utility: Automatic Redaction of Private Information in Images
T. Orekondy, M. Fritz and B. Schiele
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
D. H. Park, L. A. Hendricks, Z. Akata, A. Rohrbach, B. Schiele, T. Darrell and M. Rohrbach
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Natural and Effective Obfuscation by Head Inpainting
Q. Sun, L. Ma, S. J. Oh, L. Van Gool, B. Schiele and M. Fritz
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Feature Generating Networks for Zero-Shot Learning
Y. Xian, T. Lorenz, B. Schiele and Z. Akata
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
DoubleFusion: Real-time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor
T. Yu, Z. Zheng, K. Guo, J. Zhao, Q. Dai, H. Li, G. Pons-Moll and Y. Liu
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Occluded Pedestrian Detection through Guided Attention in CNNs
S. Zhang, J. Yang and B. Schiele
31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018
(Accepted/in press)
Quick Bootstrapping of a Personalized Gaze Model from Real-Use Interactions
M. X. Huang, J. Li, G. Ngai and H. Va Leong
ACM Transactions on Intelligent Systems and Technology, Volume 9, Number 4, 2018
VRPursuits: Interaction in Virtual Reality using Smooth Pursuit Eye Movements
M. Khamis, C. Oechsner, F. Alt and A. Bulling
AVI’18, International Conference on Advanced Visual Interfaces, 2018
(Accepted/in press)
Understanding Face and Eye Visibility in Front-Facing Cameras of Smartphones used in the Wild
M. Khamis, A. Baier, N. Henze, F. Alt and A. Bulling
CHI’18, CHI Conference on Human Factors in Computing Systems, 2018
(Accepted/in press)
Which one is me? Identifying Oneself on Public Displays
M. Khamis, C. Becker, A. Bulling and F. Alt
CHI’18, CHI Conference on Human Factors in Computing Systems, 2018
(Accepted/in press)
Training Person-Specific Gaze Estimators from Interactions with Multiple Devices
X. Zhang, M. X. Huang, Y. Sugano and A. Bulling
CHI’18, CHI Conference on Human Factors in Computing Systems, 2018
(Accepted/in press)
GazeDirector: Fully Articulated Eye Gaze Redirection in Video
E. Wood, T. Baltrusaitis, L.-P. Morency, P. Robinson and A. Bulling
Computer Graphics Forum (Proc. EUROGRAPHICS 2018), Volume 37, Number 2, 2018
(Accepted/in press)
A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks
A. Khan, I. Steiner, Y. Sugano, A. Bulling and R. Macdonald
Eleventh International Language Resources and Evaluation Conference (LREC 2018), 2018
(Accepted/in press)
Eye Movements During Everyday Behavior Predict Personality Traits
S. Hoppe, T. Loetscher, S. Morey and A. Bulling
Frontiers in Human Neuroscience, Volume 12, 2018
Towards Reaching Human Performance in Pedestrian Detection
S. Zhang, R. Benenson, M. Omran, J. Hosang and B. Schiele
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 40, Number 4, 2018
Abstract
Encouraged by the recent progress in pedestrian detection, we investigate the gap between current state-of-the-art methods and the “perfect single frame detector”. We enable our analysis by creating a human baseline for pedestrian detection (over the Caltech pedestrian dataset). After manually clustering the frequent errors of a top detector, we characterise both localisation and background- versus-foreground errors. To address localisation errors we study the impact of training annotation noise on the detector performance, and show that we can improve results even with a small portion of sanitised training data. To address background/foreground discrimination, we study convnets for pedestrian detection, and discuss which factors affect their performance. Other than our in-depth analysis, we report top performance on the Caltech pedestrian dataset, and provide a new sanitised set of training and test annotations.
MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation
X. Zhang, Y. Sugano, M. Fritz and A. Bulling
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume Early Access, 2018
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori and L. Fei-Fei
International Journal of Computer Vision, Volume 126, Number 2-4, 2018
Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behaviour
P. Müller, M. X. Huang and A. Bulling
IUI 2018, 23rd International Conference on Intelligent User Interfaces, 2018
Error-Aware Gaze-Based Interfaces for Robust Mobile Gaze Interaction
M. Barz, F. Daiber, D. Sonntag and A. Bulling
Proceedings ETRA 2018, 2018
(Accepted/in press)
A Novel Approach to Single Camera, Glint-Free 3D Eye Model Fitting Including Corneal Refraction
K. Dierkes, M. Kassner and A. Bulling
Proceedings ETRA 2018, 2018
(Accepted/in press)
Hidden Pursuits: Evaluating Gaze-selection via Pursuits when the Stimulus Trajectory is Partially Hidden
T. Mattusch, M. Mirzamohammad, M. Khamis, A. Bulling and F. Alt
Proceedings ETRA 2018, 2018
(Accepted/in press)
Robust Eye Contact Detection in Natural Multi-Person Interactions Using Gaze and Speaking Behaviour
P. Müller, M. X. Huang, X. Zhang and A. Bulling
Proceedings ETRA 2018, 2018
(Accepted/in press)
Learning to Find Eye Region Landmarks for Remote Gaze Estimation in Unconstrained Settings
S. Park, X. Zhang, A. Bulling and O. Hilliges
Proceedings ETRA 2018, 2018
(Accepted/in press)
Fixation Detection for Head-Mounted Eye Tracking Based on Visual Similarity of Gaze Targets
J. Steil, M. X. Huang and A. Bulling
Proceedings ETRA 2018, 2018
(Accepted/in press)
Revisiting Data Normalization for Appearance-Based Gaze Estimation
X. Zhang, Y. Sugano and A. Bulling
Proceedings ETRA 2018, 2018
(Accepted/in press)
Advanced Steel Microstructure Classification by Deep Learning Methods
S. M. Azimi, D. Britz, M. Engstler, M. Fritz and F. Mücklich
Scientific Reports, Volume 8, 2018
Abstract
The inner structure of a material is called microstructure. It stores the genesis of a material and determines all its physical and chemical properties. While microstructural characterization is widely spread and well known, the microstructural classification is mostly done manually by human experts, which opens doors for huge uncertainties. Since the microstructure could be a combination of different phases with complex substructures its automatic classification is very challenging and just a little work in this field has been carried out. Prior related works apply mostly designed and engineered features by experts and classify microstructure separately from feature extraction step. Recently Deep Learning methods have shown surprisingly good performance in vision applications by learning the features from data together with the classification step. In this work, we propose a deep learning method for microstructure classification in the examples of certain microstructural constituents of low carbon steel. This novel method employs pixel-wise segmentation via Fully Convolutional Neural Networks (FCNN) accompanied by max-voting scheme. Our system achieves 93.94% classification accuracy, drastically outperforming the state-of-the-art method of 48.89% accuracy, indicating the effectiveness of pixel-wise approaches. Beyond the success presented in this paper, this line of research offers a more robust and first of all objective way for the difficult task of steel quality appreciation.
Towards Reverse-Engineering Black-Box Neural Networks
S. J. Oh, M. Augustin, B. Schiele and M. Fritz
Sixth International Conference on Learning Representations (ICLR 2018), 2018
(Accepted/in press)
Long-Term Image Boundary Prediction
A. Bhattacharyya, M. Malinowski, B. Schiele and M. Fritz
Thirty-Second AAAI Conference on Artificial Intelligence, 2018
(Accepted/in press)
Video Object Segmentation with Language Referring Expressions
A. Khoreva, A. Rohrbach and B. Schiele
Technical Report, 2018
(arXiv: 1803.08006)
Abstract
Most state-of-the-art semi-supervised video object segmentation methods rely on a pixel-accurate mask of a target object provided for the first frame of a video. However, obtaining a detailed segmentation mask is expensive and time-consuming. In this work we explore an alternative way of identifying a target object, namely by employing language referring expressions. Besides being a more practical and natural way of pointing out a target object, using language specifications can help to avoid drift as well as make the system more robust to complex dynamics and appearance variations. Leveraging recent advances of language grounding models designed for images, we propose an approach to extend them to video data, ensuring temporally coherent predictions. To evaluate our method we augment the popular video object segmentation benchmarks, DAVIS'16 and DAVIS'17 with language descriptions of target objects. We show that our approach performs on par with the methods which have access to a pixel-level mask of the target object on DAVIS'16 and is competitive to methods using scribbles on the challenging DAVIS'17 dataset.
PrivacEye: Privacy-Preserving First-Person Vision Using Image Features and Eye Movement Analysis
J. Steil, M. Koelle, W. Heuten, S. Boll and A. Bulling
Technical Report, 2018
(arXiv: 1801.04457)
Abstract
As first-person cameras in head-mounted displays become increasingly prevalent, so does the problem of infringing user and bystander privacy. To address this challenge, we present PrivacEye, a proof-of-concept system that detects privacysensitive everyday situations and automatically enables and disables the first-person camera using a mechanical shutter. To close the shutter, PrivacEye detects sensitive situations from first-person camera videos using an end-to-end deep-learning model. To open the shutter without visual input, PrivacEye uses a separate, smaller eye camera to detect changes in users' eye movements to gauge changes in the "privacy level" of the current situation. We evaluate PrivacEye on a dataset of first-person videos recorded in the daily life of 17 participants that they annotated with privacy sensitivity levels. We discuss the strengths and weaknesses of our proof-of-concept system based on a quantitative technical evaluation as well as qualitative insights from semi-structured interviews.
Forecasting User Attention During Everyday Mobile Interactions Using Device-Integrated and Wearable Sensors
J. Steil, P. Müller, Y. Sugano and A. Bulling
Technical Report, 2018
(arXiv: 1801.06011)
Abstract
Users' visual attention is highly fragmented during mobile interactions but the erratic nature of these attention shifts currently limits attentive user interfaces to adapt after the fact, i.e. after shifts have already happened, thereby severely limiting the adaptation capabilities and user experience. To address these limitations, we study attention forecasting -- the challenging task of predicting whether users' overt visual attention (gaze) will shift between a mobile device and environment in the near future or how long users' attention will stay in a given location. To facilitate the development and evaluation of methods for attention forecasting, we present a novel long-term dataset of everyday mobile phone interactions, continuously recorded from 20 participants engaged in common activities on a university campus over 4.5 hours each (more than 90 hours in total). As a first step towards a fully-fledged attention forecasting interface, we further propose a proof-of-concept method that uses device-integrated sensors and body-worn cameras to encode rich information on device usage and users' visual scene. We demonstrate the feasibility of forecasting bidirectional attention shifts between the device and the environment as well as for predicting the first and total attention span on the device and environment using our method. We further study the impact of different sensors and feature sets on performance and discuss the significant potential but also remaining challenges of forecasting user attention during mobile interactions.