Hosnieh Sattar (PhD Student)

Personal Information

Research Interests

Machine Learning and Pattern Recognition
Eye Tracking and Visual Cognition
Image Analysis and Computer vision
Human-Computer Interaction

Education

2015–present, Ph.D. student in Computer Science, Max Planck Institute for Informatics
2014, M.Sc. in Visual Computing, Saarland University
2011, B.Sc. in Biomedical Engineering, Islamic Azad University of Mashhad

Teaching

PDE and Boundary Value Problems, Saarland University, (Dr. Darya Apushkinskaya, 2013/14)

Research Projects

Prediction of Search Targets From Fixations in Open-World Settings
Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling
Visual Decoding of Targets During Visual Search From Human Eye Fixations
Fashion is Taking Shape: Understanding Clothing Preference Based on Body Shape From Online Sources

Publications

2020

Conference paper

H. Sattar, K. Krombholz, G. Pons-Moll, and M. Fritz

“Body Shape Privacy in Images: Understanding Privacy and Preventing Automatic Shape Extraction,” in Computer Vision -- ECCV Workshops 2020, Glasgow, UK, 2021.

Abstract

Modern approaches to pose and body shape estimation have recently achieved
strong performance even under challenging real-world conditions. Even from a
single image of a clothed person, a realistic looking body shape can be
inferred that captures a users' weight group and body shape type well. This
opens up a whole spectrum of applications -- in particular in fashion -- where
virtual try-on and recommendation systems can make use of these new and
automatized cues. However, a realistic depiction of the undressed body is
regarded highly private and therefore might not be consented by most people.
Hence, we ask if the automatic extraction of such information can be
effectively evaded. While adversarial perturbations have been shown to be
effective for manipulating the output of machine learning models -- in
particular, end-to-end deep learning approaches -- state of the art shape
estimation methods are composed of multiple stages. We perform the first
investigation of different strategies that can be used to effectively
manipulate the automatic shape estimation while preserving the overall
appearance of the original image.

BibTeX

@inproceedings{Sattar_ECCV20,
TITLE = {Body Shape Privacy in Images: {U}nderstanding Privacy and Preventing Automatic Shape Extraction},
AUTHOR = {Sattar, Hosnieh and Krombholz, Katharina and Pons-Moll, Gerard and Fritz, Mario},
LANGUAGE = {eng},
ISBN = {978-3-030-68237-8},
DOI = {10.1007/978-3-030-68238-5_31},
PUBLISHER = {Springer},
YEAR = {2020},
ABSTRACT = {Modern approaches to pose and body shape estimation have recently achieved<br>strong performance even under challenging real-world conditions. Even from a<br>single image of a clothed person, a realistic looking body shape can be<br>inferred that captures a users' weight group and body shape type well. This<br>opens up a whole spectrum of applications -- in particular in fashion -- where<br>virtual try-on and recommendation systems can make use of these new and<br>automatized cues. However, a realistic depiction of the undressed body is<br>regarded highly private and therefore might not be consented by most people.<br>Hence, we ask if the automatic extraction of such information can be<br>effectively evaded. While adversarial perturbations have been shown to be<br>effective for manipulating the output of machine learning models -- in<br>particular, end-to-end deep learning approaches -- state of the art shape<br>estimation methods are composed of multiple stages. We perform the first<br>investigation of different strategies that can be used to effectively<br>manipulate the automatic shape estimation while preserving the overall<br>appearance of the original image.<br>},
BOOKTITLE = {Computer Vision -- ECCV Workshops 2020},
EDITOR = {Bartoli, Adrien and Fusiello, Andrea},
PAGES = {411--428},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {12539},
ADDRESS = {Glasgow, UK},
}

Endnote

%0 Conference Proceedings
%A Sattar, Hosnieh
%A Krombholz, Katharina
%A Pons-Moll, Gerard
%A Fritz, Mario
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
%T Body Shape Privacy in Images: Understanding Privacy and Preventing Automatic Shape Extraction : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-D755-7
%R 10.1007/978-3-030-68238-5_31
%D 2021
%B 16th European Conference on Compute Vision
%Z date of event: 2020-08-23 - 2020-08-28
%C Glasgow, UK
%X   Modern approaches to pose and body shape estimation have recently achieved<br>strong performance even under challenging real-world conditions. Even from a<br>single image of a clothed person, a realistic looking body shape can be<br>inferred that captures a users' weight group and body shape type well. This<br>opens up a whole spectrum of applications -- in particular in fashion -- where<br>virtual try-on and recommendation systems can make use of these new and<br>automatized cues. However, a realistic depiction of the undressed body is<br>regarded highly private and therefore might not be consented by most people.<br>Hence, we ask if the automatic extraction of such information can be<br>effectively evaded. While adversarial perturbations have been shown to be<br>effective for manipulating the output of machine learning models -- in<br>particular, end-to-end deep learning approaches -- state of the art shape<br>estimation methods are composed of multiple stages. We perform the first<br>investigation of different strategies that can be used to effectively<br>manipulate the automatic shape estimation while preserving the overall<br>appearance of the original image.<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Cryptography and Security, cs.CR,Computer Science, Learning, cs.LG
%B Computer Vision -- ECCV Workshops 2020
%E Bartoli, Adrien; Fusiello, Andrea
%P 411 - 428
%I Springer
%@ 978-3-030-68237-8
%B Lecture Notes in Computer Science
%N 12539

Article

H. Sattar, M. Fritz, and A. Bulling

“Deep Gaze Pooling: Inferring and Visually Decoding Search Intents from Human Gaze Fixations,” Neurocomputing, vol. 387, 2020.

@article{Sattar2020,
TITLE = {Deep gaze pooling: {I}nferring and visually decoding search intents from human gaze fixations},
AUTHOR = {Sattar, Hosnieh and Fritz, Mario and Bulling, Andreas},
LANGUAGE = {eng},
ISSN = {0925-2312},
DOI = {10.1016/j.neucom.2020.01.028},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2020},
DATE = {2020},
JOURNAL = {Neurocomputing},
VOLUME = {387},
PAGES = {369--382},
}

Endnote

%0 Journal Article
%A Sattar, Hosnieh
%A Fritz, Mario
%A Bulling, Andreas
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Deep Gaze Pooling: Inferring and Visually Decoding Search Intents from Human Gaze Fixations  : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0006-977C-5
%R 10.1016/j.neucom.2020.01.028
%7 2020
%D 2020
%J Neurocomputing
%V 387
%& 369
%P 369 - 382
%I Elsevier
%C Amsterdam
%@ false

2019

Conference paper

H. Sattar, G. Pons-Moll, and M. Fritz

“Fashion is Taking Shape: Understanding Clothing Preference Based on Body Shape From Online Sources,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV 2019), Waikoloa Village, HI, USA, 2019.

@inproceedings{sattar19wacv,
TITLE = {Fashion is Taking Shape: {U}nderstanding Clothing Preference Based on Body Shape From Online Sources},
AUTHOR = {Sattar, Hosnieh and Pons-Moll, Gerard and Fritz, Mario},
LANGUAGE = {eng},
ISBN = {978-1-7281-1975-5},
DOI = {10.1109/WACV.2019.00108},
PUBLISHER = {IEEE},
YEAR = {2019},
BOOKTITLE = {2019 IEEE Winter Conference on Applications of Computer Vision (WACV 2019)},
PAGES = {968--977},
ADDRESS = {Waikoloa Village, HI, USA},
}

Endnote

%0 Conference Proceedings
%A Sattar, Hosnieh
%A Pons-Moll, Gerard
%A Fritz, Mario
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
%T Fashion is Taking Shape: Understanding Clothing Preference Based on Body Shape From Online Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-B309-B
%R 10.1109/WACV.2019.00108
%D 2019
%B IEEE Winter Conference on Applications of Computer Vision
%Z date of event: 2019-01-08 - 2019-01-10
%C Waikoloa Village, HI, USA
%B 2019 IEEE Winter Conference on Applications of Computer Vision
%P 968 - 977
%I IEEE
%@ 978-1-7281-1975-5

Paper

H. Sattar, K. Krombholz, G. Pons-Moll, and M. Fritz

“Shape Evasion: Preventing Body Shape Inference of Multi-Stage Approaches,” 2019. [Online]. Available: http://arxiv.org/abs/1905.11503.

Abstract

BibTeX

@online{Sattar_arXiv1905.11503,
TITLE = {Shape Evasion: Preventing Body Shape Inference of Multi-Stage Approaches},
AUTHOR = {Sattar, Hosnieh and Krombholz, Katharina and Pons-Moll, Gerard and Fritz, Mario},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1905.11503},
EPRINT = {1905.11503},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Modern approaches to pose and body shape estimation have recently achieved<br>strong performance even under challenging real-world conditions. Even from a<br>single image of a clothed person, a realistic looking body shape can be<br>inferred that captures a users' weight group and body shape type well. This<br>opens up a whole spectrum of applications -- in particular in fashion -- where<br>virtual try-on and recommendation systems can make use of these new and<br>automatized cues. However, a realistic depiction of the undressed body is<br>regarded highly private and therefore might not be consented by most people.<br>Hence, we ask if the automatic extraction of such information can be<br>effectively evaded. While adversarial perturbations have been shown to be<br>effective for manipulating the output of machine learning models -- in<br>particular, end-to-end deep learning approaches -- state of the art shape<br>estimation methods are composed of multiple stages. We perform the first<br>investigation of different strategies that can be used to effectively<br>manipulate the automatic shape estimation while preserving the overall<br>appearance of the original image.<br>},
}

Endnote

%0 Report
%A Sattar, Hosnieh
%A Krombholz, Katharina
%A Pons-Moll, Gerard
%A Fritz, Mario
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
%T Shape Evasion: Preventing Body Shape Inference of Multi-Stage Approaches : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-B2E5-1
%U http://arxiv.org/abs/1905.11503
%D 2019
%X   Modern approaches to pose and body shape estimation have recently achieved<br>strong performance even under challenging real-world conditions. Even from a<br>single image of a clothed person, a realistic looking body shape can be<br>inferred that captures a users' weight group and body shape type well. This<br>opens up a whole spectrum of applications -- in particular in fashion -- where<br>virtual try-on and recommendation systems can make use of these new and<br>automatized cues. However, a realistic depiction of the undressed body is<br>regarded highly private and therefore might not be consented by most people.<br>Hence, we ask if the automatic extraction of such information can be<br>effectively evaded. While adversarial perturbations have been shown to be<br>effective for manipulating the output of machine learning models -- in<br>particular, end-to-end deep learning approaches -- state of the art shape<br>estimation methods are composed of multiple stages. We perform the first<br>investigation of different strategies that can be used to effectively<br>manipulate the automatic shape estimation while preserving the overall<br>appearance of the original image.<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Cryptography and Security, cs.CR,Computer Science, Learning, cs.LG

Thesis

D2IMPR-CS

H. Sattar

“Intents and Preferences Prediction Based on Implicit Human Cues,” Universität des Saarlandes, Saarbrücken, 2019.

Abstract

Visual search is an important task, and it is part of daily human life. Thus, it has been a long-standing goal in Computer Vision to develop methods aiming at analysing human search intent and preferences. As the target of the search only exists in mind of the person, search intent prediction remains challenging for machine perception. In this thesis, we focus on advancing techniques for search target and preference prediction from implicit human cues. First, we propose a search target inference algorithm from human fixation data recorded during visual search. In contrast to previous work that has focused on individual instances as a search target in a closed world, we propose the first approach to predict the search target in open-world settings by learning the compatibility between observed fixations and potential search targets. Second, we further broaden the scope of search target prediction to categorical classes, such as object categories and attributes. However, state of the art models for categorical recognition, in general, require large amounts of training data, which is prohibitive for gaze data. To address this challenge, we propose a novel Gaze Pooling Layer that integrates gaze information into CNN-based architectures as an attention mechanism – incorporating both spatial and temporal aspects of human gaze behaviour. Third, we go one step further and investigate the feasibility of combining our gaze embedding approach, with the power of generative image models to visually decode, i.e. create a visual representation of, the search target. Forth, for the first time, we studied the effect of body shape on people preferences of outfits. We propose a novel and robust multi-photo approach to estimate the body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated. We show that our approach estimates a realistic looking body shape that captures a user’s weight group and body shape type, even from a single image of a clothed person. However, an accurate depiction of the naked body is considered highly private and therefore, might not be consented by most people. First, we studied the perception of such technology via a user study. Then, in the last part of this thesis, we ask if the automatic extraction of such information can be effectively evaded. In summary, this thesis addresses several different tasks that aims to enable the vision system to analyse human search intent and preferences in real-world scenarios. In particular, the thesis proposes several novel ideas and models in visual search target prediction from human fixation data, for the first time studied the correlation between shape and clothing categories opening a new direction in clothing recommendation systems, and introduces a new topic in privacy and computer vision, aimed at preventing automatic 3D shape extraction from images.

BibTeX

@phdthesis{Sattar_PhD2019,
TITLE = {Intents and Preferences Prediction Based on Implicit Human Cues},
AUTHOR = {Sattar, Hosnieh},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-281920},
DOI = {10.22028/D291-28192},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
ABSTRACT = {Visual search is an important task, and it is part of daily human life. Thus, it has been a long-standing goal in Computer Vision to develop methods aiming at analysing human search intent and preferences. As the target of the search only exists in mind of the person, search intent prediction remains challenging for machine perception. In this thesis, we focus on advancing techniques for search target and preference prediction from implicit human cues. First, we propose a search target inference algorithm from human fixation data recorded during visual search. In contrast to previous work that has focused on individual instances as a search target in a closed world, we propose the first approach to predict the search target in open-world settings by learning the compatibility between observed fixations and potential search targets. Second, we further broaden the scope of search target prediction to categorical classes, such as object categories and attributes. However, state of the art models for categorical recognition, in general, require large amounts of training data, which is prohibitive for gaze data. To address this challenge, we propose a novel Gaze Pooling Layer that integrates gaze information into CNN-based architectures as an attention mechanism -- incorporating both spatial and temporal aspects of human gaze behaviour. Third, we go one step further and investigate the feasibility of combining our gaze embedding approach, with the power of generative image models to visually decode, i.e. create a visual representation of, the search target. Forth, for the first time, we studied the effect of body shape on people preferences of outfits. We propose a novel and robust multi-photo approach to estimate the body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated. We show that our approach estimates a realistic looking body shape that captures a user{\textquoteright}s weight group and body shape type, even from a single image of a clothed person. However, an accurate depiction of the naked body is considered highly private and therefore, might not be consented by most people. First, we studied the perception of such technology via a user study. Then, in the last part of this thesis, we ask if the automatic extraction of such information can be effectively evaded. In summary, this thesis addresses several different tasks that aims to enable the vision system to analyse human search intent and preferences in real-world scenarios. In particular, the thesis proposes several novel ideas and models in visual search target prediction from human fixation data, for the first time studied the correlation between shape and clothing categories opening a new direction in clothing recommendation systems, and introduces a new topic in privacy and computer vision, aimed at preventing automatic 3D shape extraction from images.},
}

Endnote

%0 Thesis
%A Sattar, Hosnieh
%Y Fritz, Mario
%A referee: Schiele, Bernt
%A referee: Sugano, Yusuke
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T Intents and Preferences Prediction Based on Implicit Human Cues :
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-8E7F-F
%R 10.22028/D291-28192
%U urn:nbn:de:bsz:291--ds-281920
%F OTHER: hdl:20.500.11880/27625
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%P X, 136 p.
%V phd
%9 phd
%X Visual search is an important task, and it is part of daily human life. Thus, it has been a long-standing goal in Computer Vision to develop methods aiming at analysing human search intent and preferences. As the target of the search only exists in mind of the person, search intent prediction remains challenging for machine perception. In this thesis, we focus on advancing techniques for search target and preference prediction from implicit human cues. First, we propose a search target inference algorithm from human fixation data recorded during visual search. In contrast to previous work that has focused on individual instances as a search target in a closed world, we propose the first approach to predict the search target in open-world settings by learning the compatibility between observed fixations and potential search targets. Second, we further broaden the scope of search target prediction to categorical classes, such as object categories and attributes. However, state of the art models for categorical recognition, in general, require large amounts of training data, which is prohibitive for gaze data. To address this challenge, we propose a novel Gaze Pooling Layer that integrates gaze information into CNN-based architectures as an attention mechanism &#8211; incorporating both spatial and temporal aspects of human gaze behaviour. Third, we go one step further and investigate the feasibility of combining our gaze embedding approach, with the power of generative image models to visually decode, i.e. create a visual representation of, the search target. Forth, for the first time, we studied the effect of body shape on people preferences of outfits. We propose a novel and robust multi-photo approach to estimate the body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated. We show that our approach estimates a realistic looking body shape that captures a user&#8217;s weight group and body shape type, even from a single image of a clothed person. However, an accurate depiction of the naked body is considered highly private and therefore, might not be consented by most people. First, we studied the perception of such technology via a user study. Then, in the last part of this thesis, we ask if the automatic extraction of such information can be effectively evaded. In summary, this thesis addresses several different tasks that aims to enable the vision system to analyse human search intent and preferences in real-world scenarios. In particular, the thesis proposes several novel ideas and models in visual search target prediction from human fixation data, for the first time studied the correlation between shape and clothing categories opening a new direction in clothing recommendation systems, and introduces a new topic in privacy and computer vision, aimed at preventing automatic 3D shape extraction from images.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27625

2017

Conference paper

H. Sattar, A. Bulling, and M. Fritz

“Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling,” in 2017 IEEE International Conference on Computer Vision Workshops (MBCC @ICCV 2017), Venice, Italy, 2017.

Abstract

Previous work focused on predicting visual search targets from human

fixations but, in the real world, a specific target is often not known, e.g.

when searching for a present for a friend. In this work we instead study the

problem of predicting the mental picture, i.e. only an abstract idea instead of

a specific target. This task is significantly more challenging given that

mental pictures of the same target category can vary widely depending on

personal biases, and given that characteristic target attributes can often not

be verbalised explicitly. We instead propose to use gaze information as

implicit information on users' mental picture and present a novel gaze pooling

layer to seamlessly integrate semantic and localized fixation information into

a deep image representation. We show that we can robustly predict both the

mental picture's category as well as attributes on a novel dataset containing

fixation data of 14 users searching for targets on a subset of the DeepFahion

dataset. Our results have important implications for future search interfaces

and suggest deep gaze pooling as a general-purpose approach for gaze-supported

computer vision systems.

BibTeX

@inproceedings{sattar17iccvw,
TITLE = {Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling},
AUTHOR = {Sattar, Hosnieh and Bulling, Andreas and Fritz, Mario},
LANGUAGE = {eng},
ISBN = {978-1-5386-1034-3},
DOI = {10.1109/ICCVW.2017.322},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
ABSTRACT = {Previous work focused on predicting visual search targets from human fixations but, in the real world, a specific target is often not known, e.g. when searching for a present for a friend. In this work we instead study the problem of predicting the mental picture, i.e. only an abstract idea instead of a specific target. This task is significantly more challenging given that mental pictures of the same target category can vary widely depending on personal biases, and given that characteristic target attributes can often not be verbalised explicitly. We instead propose to use gaze information as implicit information on users' mental picture and present a novel gaze pooling layer to seamlessly integrate semantic and localized fixation information into a deep image representation. We show that we can robustly predict both the mental picture's category as well as attributes on a novel dataset containing fixation data of 14 users searching for targets on a subset of the DeepFahion dataset. Our results have important implications for future search interfaces and suggest deep gaze pooling as a general-purpose approach for gaze-supported computer vision systems.},
BOOKTITLE = {2017 IEEE International Conference on Computer Vision Workshops (MBCC @ICCV 2017)},
PAGES = {2740--2748},
ADDRESS = {Venice, Italy},
}

Endnote

%0 Conference Proceedings
%A Sattar, Hosnieh
%A Bulling, Andreas
%A Fritz, Mario
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-1094-8
%R 10.1109/ICCVW.2017.322
%D 2017
%B Mutual Benefits of Cognitive and Computer Vision Workshop at International Conference on Computer Vision
%Z date of event: 2017-10-29 - 2017-10-29
%C Venice, Italy
%X   Previous work focused on predicting visual search targets from human
fixations but, in the real world, a specific target is often not known, e.g.
when searching for a present for a friend. In this work we instead study the
problem of predicting the mental picture, i.e. only an abstract idea instead of
a specific target. This task is significantly more challenging given that
mental pictures of the same target category can vary widely depending on
personal biases, and given that characteristic target attributes can often not
be verbalised explicitly. We instead propose to use gaze information as
implicit information on users' mental picture and present a novel gaze pooling
layer to seamlessly integrate semantic and localized fixation information into
a deep image representation. We show that we can robustly predict both the
mental picture's category as well as attributes on a novel dataset containing
fixation data of 14 users searching for targets on a subset of the DeepFahion
dataset. Our results have important implications for future search interfaces
and suggest deep gaze pooling as a general-purpose approach for gaze-supported
computer vision systems.

%K Quantitative Biology, Neurons and Cognition, q-bio.NC,Computer Science, Computer Vision and Pattern Recognition, cs.CV
%B 2017 IEEE International Conference  on Computer Vision Workshops 
%P 2740 - 2748
%I IEEE
%@ 978-1-5386-1034-3

Paper

H. Sattar, M. Fritz, and A. Bulling

“Visual Decoding of Targets During Visual Search From Human Eye Fixations,” 2017. [Online]. Available: http://arxiv.org/abs/1706.05993.

Abstract

What does human gaze reveal about a users' intents and to which extend can

these intents be inferred or even visualized? Gaze was proposed as an implicit

source of information to predict the target of visual search and, more

recently, to predict the object class and attributes of the search target. In

this work, we go one step further and investigate the feasibility of combining

recent advances in encoding human gaze information using deep convolutional

neural networks with the power of generative image models to visually decode,

i.e. create a visual representation of, the search target. Such visual decoding

is challenging for two reasons: 1) the search target only resides in the user's

mind as a subjective visual pattern, and can most often not even be described

verbally by the person, and 2) it is, as of yet, unclear if gaze fixations

contain sufficient information for this task at all. We show, for the first

time, that visual representations of search targets can indeed be decoded only

from human gaze fixations. We propose to first encode fixations into a semantic

representation and then decode this representation into an image. We evaluate

our method on a recent gaze dataset of 14 participants searching for clothing

in image collages and validate the model's predictions using two human studies.

Our results show that 62% (Chance level = 10%) of the time users were able to

select the categories of the decoded image right. In our second studies we show

the importance of a local gaze encoding for decoding visual search targets of

user

BibTeX

@online{DBLP:journals/corr/SattarFB17,
TITLE = {Visual Decoding of Targets During Visual Search From Human Eye Fixations},
AUTHOR = {Sattar, Hosnieh and Fritz, Mario and Bulling, Andreas},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1706.05993},
EPRINT = {1706.05993},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {What does human gaze reveal about a users' intents and to which extend can these intents be inferred or even visualized? Gaze was proposed as an implicit source of information to predict the target of visual search and, more recently, to predict the object class and attributes of the search target. In this work, we go one step further and investigate the feasibility of combining recent advances in encoding human gaze information using deep convolutional neural networks with the power of generative image models to visually decode, i.e. create a visual representation of, the search target. Such visual decoding is challenging for two reasons: 1) the search target only resides in the user's mind as a subjective visual pattern, and can most often not even be described verbally by the person, and 2) it is, as of yet, unclear if gaze fixations contain sufficient information for this task at all. We show, for the first time, that visual representations of search targets can indeed be decoded only from human gaze fixations. We propose to first encode fixations into a semantic representation and then decode this representation into an image. We evaluate our method on a recent gaze dataset of 14 participants searching for clothing in image collages and validate the model's predictions using two human studies. Our results show that 62% (Chance level = 10%) of the time users were able to select the categories of the decoded image right. In our second studies we show the importance of a local gaze encoding for decoding visual search targets of user},
}

Endnote

%0 Report
%A Sattar, Hosnieh
%A Fritz, Mario
%A Bulling, Andreas
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Visual Decoding of Targets During Visual Search From Human Eye Fixations : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-8B50-3
%U http://arxiv.org/abs/1706.05993
%D 2017
%X   What does human gaze reveal about a users' intents and to which extend can
these intents be inferred or even visualized? Gaze was proposed as an implicit
source of information to predict the target of visual search and, more
recently, to predict the object class and attributes of the search target. In
this work, we go one step further and investigate the feasibility of combining
recent advances in encoding human gaze information using deep convolutional
neural networks with the power of generative image models to visually decode,
i.e. create a visual representation of, the search target. Such visual decoding
is challenging for two reasons: 1) the search target only resides in the user's
mind as a subjective visual pattern, and can most often not even be described
verbally by the person, and 2) it is, as of yet, unclear if gaze fixations
contain sufficient information for this task at all. We show, for the first
time, that visual representations of search targets can indeed be decoded only
from human gaze fixations. We propose to first encode fixations into a semantic
representation and then decode this representation into an image. We evaluate
our method on a recent gaze dataset of 14 participants searching for clothing
in image collages and validate the model's predictions using two human studies.
Our results show that 62% (Chance level = 10%) of the time users were able to
select the categories of the decoded image right. In our second studies we show
the importance of a local gaze encoding for decoding visual search targets of
user

%K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Human-Computer Interaction, cs.HC

2015

Conference paper

H. Sattar, S. Müller, M. Fritz, and A. Bulling

“Prediction of Search Targets from Fixations in Open-world Settings,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA, 2015.

@inproceedings{Sattar15cvpr,
TITLE = {Prediction of Search Targets from Fixations in Open-world Settings},
AUTHOR = {Sattar, Hosnieh and M{\"u}ller, Sabine and Fritz, Mario and Bulling, Andreas},
LANGUAGE = {eng},
DOI = {10.1109/CVPR.2015.7298700},
PUBLISHER = {IEEE Computer Society},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015)},
PAGES = {981--990},
ADDRESS = {Boston, MA, USA},
}

Endnote

%0 Conference Proceedings
%A Sattar, Hosnieh
%A M&#252;ller, Sabine
%A Fritz, Mario
%A Bulling, Andreas
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Prediction of Search Targets from Fixations in Open-world Settings : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0025-01C3-4
%R 10.1109/CVPR.2015.7298700
%D 2015
%B IEEE Conference on Computer Vision and Pattern Recognition
%Z date of event: 2015-06-08 - 2015-06-10
%C Boston, MA, USA
%B IEEE Conference on Computer Vision and Pattern Recognition
%P 981 - 990
%I IEEE Computer Society