Hosnieh Sattar (PhD Student)

Personal Information
Research Interests
- Machine Learning and Pattern Recognition
- Eye Tracking and Visual Cognition
- Image Analysis and Computer vision
- Human-Computer Interaction
Education
- 2015–present, Ph.D. student in Computer Science, Max Planck Institute for Informatics
- 2014, M.Sc. in Visual Computing, Saarland University
- 2011, B.Sc. in Biomedical Engineering, Islamic Azad University of Mashhad
Teaching
- PDE and Boundary Value Problems, Saarland University, (Dr. Darya Apushkinskaya, 2013/14)
Research Projects
Prediction of Search Targets From Fixations in Open-World Settings
Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling
Visual Decoding of Targets During Visual Search From Human Eye Fixations
Fashion is Taking Shape: Understanding Clothing Preference Based on Body Shape From Online Sources
Publications
2020
Body Shape Privacy in Images: Understanding Privacy and Preventing Automatic Shape Extraction
H. Sattar, K. Krombholz, G. Pons-Moll and M. Fritz
Computer Vision -- ECCV Workshops 2020, 2020
H. Sattar, K. Krombholz, G. Pons-Moll and M. Fritz
Computer Vision -- ECCV Workshops 2020, 2020
Abstract
Modern approaches to pose and body shape estimation have recently achieved<br>strong performance even under challenging real-world conditions. Even from a<br>single image of a clothed person, a realistic looking body shape can be<br>inferred that captures a users' weight group and body shape type well. This<br>opens up a whole spectrum of applications -- in particular in fashion -- where<br>virtual try-on and recommendation systems can make use of these new and<br>automatized cues. However, a realistic depiction of the undressed body is<br>regarded highly private and therefore might not be consented by most people.<br>Hence, we ask if the automatic extraction of such information can be<br>effectively evaded. While adversarial perturbations have been shown to be<br>effective for manipulating the output of machine learning models -- in<br>particular, end-to-end deep learning approaches -- state of the art shape<br>estimation methods are composed of multiple stages. We perform the first<br>investigation of different strategies that can be used to effectively<br>manipulate the automatic shape estimation while preserving the overall<br>appearance of the original image.<br>
2019
Shape Evasion: Preventing Body Shape Inference of Multi-Stage Approaches
H. Sattar, K. Krombholz, G. Pons-Moll and M. Fritz
Technical Report, 2019
(arXiv: 1905.11503) H. Sattar, K. Krombholz, G. Pons-Moll and M. Fritz
Technical Report, 2019
Abstract
Modern approaches to pose and body shape estimation have recently achieved<br>strong performance even under challenging real-world conditions. Even from a<br>single image of a clothed person, a realistic looking body shape can be<br>inferred that captures a users' weight group and body shape type well. This<br>opens up a whole spectrum of applications -- in particular in fashion -- where<br>virtual try-on and recommendation systems can make use of these new and<br>automatized cues. However, a realistic depiction of the undressed body is<br>regarded highly private and therefore might not be consented by most people.<br>Hence, we ask if the automatic extraction of such information can be<br>effectively evaded. While adversarial perturbations have been shown to be<br>effective for manipulating the output of machine learning models -- in<br>particular, end-to-end deep learning approaches -- state of the art shape<br>estimation methods are composed of multiple stages. We perform the first<br>investigation of different strategies that can be used to effectively<br>manipulate the automatic shape estimation while preserving the overall<br>appearance of the original image.<br>
Intents and Preferences Prediction Based on Implicit Human Cues
H. Sattar
PhD Thesis, Universität des Saarlandes, 2019
H. Sattar
PhD Thesis, Universität des Saarlandes, 2019
Abstract
Visual search is an important task, and it is part of daily human life. Thus, it has been a long-standing goal in Computer Vision to develop methods aiming at analysing human search intent and preferences. As the target of the search only exists in mind of the person, search intent prediction remains challenging for machine perception. In this thesis, we focus on advancing techniques for search target and preference prediction from implicit human cues. First, we propose a search target inference algorithm from human fixation data recorded during visual search. In contrast to previous work that has focused on individual instances as a search target in a closed world, we propose the first approach to predict the search target in open-world settings by learning the compatibility between observed fixations and potential search targets. Second, we further broaden the scope of search target prediction to categorical classes, such as object categories and attributes. However, state of the art models for categorical recognition, in general, require large amounts of training data, which is prohibitive for gaze data. To address this challenge, we propose a novel Gaze Pooling Layer that integrates gaze information into CNN-based architectures as an attention mechanism – incorporating both spatial and temporal aspects of human gaze behaviour. Third, we go one step further and investigate the feasibility of combining our gaze embedding approach, with the power of generative image models to visually decode, i.e. create a visual representation of, the search target. Forth, for the first time, we studied the effect of body shape on people preferences of outfits. We propose a novel and robust multi-photo approach to estimate the body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated. We show that our approach estimates a realistic looking body shape that captures a user’s weight group and body shape type, even from a single image of a clothed person. However, an accurate depiction of the naked body is considered highly private and therefore, might not be consented by most people. First, we studied the perception of such technology via a user study. Then, in the last part of this thesis, we ask if the automatic extraction of such information can be effectively evaded. In summary, this thesis addresses several different tasks that aims to enable the vision system to analyse human search intent and preferences in real-world scenarios. In particular, the thesis proposes several novel ideas and models in visual search target prediction from human fixation data, for the first time studied the correlation between shape and clothing categories opening a new direction in clothing recommendation systems, and introduces a new topic in privacy and computer vision, aimed at preventing automatic 3D shape extraction from images.
2017
Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling
H. Sattar, A. Bulling and M. Fritz
2017 IEEE International Conference on Computer Vision Workshops (MBCC @ICCV 2017), 2017
H. Sattar, A. Bulling and M. Fritz
2017 IEEE International Conference on Computer Vision Workshops (MBCC @ICCV 2017), 2017
Abstract
Previous work focused on predicting visual search targets from human
fixations but, in the real world, a specific target is often not known, e.g.
when searching for a present for a friend. In this work we instead study the
problem of predicting the mental picture, i.e. only an abstract idea instead of
a specific target. This task is significantly more challenging given that
mental pictures of the same target category can vary widely depending on
personal biases, and given that characteristic target attributes can often not
be verbalised explicitly. We instead propose to use gaze information as
implicit information on users' mental picture and present a novel gaze pooling
layer to seamlessly integrate semantic and localized fixation information into
a deep image representation. We show that we can robustly predict both the
mental picture's category as well as attributes on a novel dataset containing
fixation data of 14 users searching for targets on a subset of the DeepFahion
dataset. Our results have important implications for future search interfaces
and suggest deep gaze pooling as a general-purpose approach for gaze-supported
computer vision systems.
Visual Decoding of Targets During Visual Search From Human Eye Fixations
H. Sattar, M. Fritz and A. Bulling
Technical Report, 2017
(arXiv: 1706.05993) H. Sattar, M. Fritz and A. Bulling
Technical Report, 2017
Abstract
What does human gaze reveal about a users' intents and to which extend can
these intents be inferred or even visualized? Gaze was proposed as an implicit
source of information to predict the target of visual search and, more
recently, to predict the object class and attributes of the search target. In
this work, we go one step further and investigate the feasibility of combining
recent advances in encoding human gaze information using deep convolutional
neural networks with the power of generative image models to visually decode,
i.e. create a visual representation of, the search target. Such visual decoding
is challenging for two reasons: 1) the search target only resides in the user's
mind as a subjective visual pattern, and can most often not even be described
verbally by the person, and 2) it is, as of yet, unclear if gaze fixations
contain sufficient information for this task at all. We show, for the first
time, that visual representations of search targets can indeed be decoded only
from human gaze fixations. We propose to first encode fixations into a semantic
representation and then decode this representation into an image. We evaluate
our method on a recent gaze dataset of 14 participants searching for clothing
in image collages and validate the model's predictions using two human studies.
Our results show that 62% (Chance level = 10%) of the time users were able to
select the categories of the decoded image right. In our second studies we show
the importance of a local gaze encoding for decoding visual search targets of
user
2015