Wenbin Li (PhD Student)

Personal Information

Research Interests

Robotics
Activity Modeling
Material Recognition
Machine Learning

Education

2013-present: PhD student at Max Planck Institute for Informatics and Saarland University, Germany
2010-present: Graduate student at Graduate School for Computer Science, Saarland University, Germany
2010-2013: M.Sc. in Computer Science, Saarland University, Germany
2006-2010: B.Sc. in Science and Technology of Intelligence, Beijing University of Posts and Telecommunications, China

For more information, please visit my personal homepage.

Publications

2019

Paper

W. Li, A. Leonardis, J. Bohg, and M. Fritz

“Learning Manipulation under Physics Constraints with Visual Perception,” 2019. [Online]. Available: http://arxiv.org/abs/1904.09860.

Abstract

Understanding physical phenomena is a key competence that enables humans and
animals to act and interact under uncertain perception in previously unseen
environments containing novel objects and their configurations. In this work,
we consider the problem of autonomous block stacking and explore solutions to
learning manipulation under physics constraints with visual perception inherent
to the task. Inspired by the intuitive physics in humans, we first present an
end-to-end learning-based approach to predict stability directly from
appearance, contrasting a more traditional model-based approach with explicit
3D representations and physical simulation. We study the model's behavior
together with an accompanied human subject test. It is then integrated into a
real-world robotic system to guide the placement of a single wood block into
the scene without collapsing existing tower structure. To further automate the
process of consecutive blocks stacking, we present an alternative approach
where the model learns the physics constraint through the interaction with the
environment, bypassing the dedicated physics learning as in the former part of
this work. In particular, we are interested in the type of tasks that require
the agent to reach a given goal state that may be different for every new
trial. Thereby we propose a deep reinforcement learning framework that learns
policies for stacking tasks which are parametrized by a target structure.

BibTeX

@online{Li_arXiv1904.09860,
TITLE = {Learning Manipulation under Physics Constraints with Visual Perception},
AUTHOR = {Li, Wenbin and Leonardis, Ale{\v s} and Bohg, Jeannette and Fritz, Mario},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1904.09860},
EPRINT = {1904.09860},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Understanding physical phenomena is a key competence that enables humans and<br>animals to act and interact under uncertain perception in previously unseen<br>environments containing novel objects and their configurations. In this work,<br>we consider the problem of autonomous block stacking and explore solutions to<br>learning manipulation under physics constraints with visual perception inherent<br>to the task. Inspired by the intuitive physics in humans, we first present an<br>end-to-end learning-based approach to predict stability directly from<br>appearance, contrasting a more traditional model-based approach with explicit<br>3D representations and physical simulation. We study the model's behavior<br>together with an accompanied human subject test. It is then integrated into a<br>real-world robotic system to guide the placement of a single wood block into<br>the scene without collapsing existing tower structure. To further automate the<br>process of consecutive blocks stacking, we present an alternative approach<br>where the model learns the physics constraint through the interaction with the<br>environment, bypassing the dedicated physics learning as in the former part of<br>this work. In particular, we are interested in the type of tasks that require<br>the agent to reach a given goal state that may be different for every new<br>trial. Thereby we propose a deep reinforcement learning framework that learns<br>policies for stacking tasks which are parametrized by a target structure.<br>},
}

Endnote

%0 Report
%A Li, Wenbin
%A Leonardis, Ale&#353;
%A Bohg, Jeannette
%A Fritz, Mario
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T Learning Manipulation under Physics Constraints with Visual Perception : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-EC5C-D
%U http://arxiv.org/abs/1904.09860
%D 2019
%X   Understanding physical phenomena is a key competence that enables humans and<br>animals to act and interact under uncertain perception in previously unseen<br>environments containing novel objects and their configurations. In this work,<br>we consider the problem of autonomous block stacking and explore solutions to<br>learning manipulation under physics constraints with visual perception inherent<br>to the task. Inspired by the intuitive physics in humans, we first present an<br>end-to-end learning-based approach to predict stability directly from<br>appearance, contrasting a more traditional model-based approach with explicit<br>3D representations and physical simulation. We study the model's behavior<br>together with an accompanied human subject test. It is then integrated into a<br>real-world robotic system to guide the placement of a single wood block into<br>the scene without collapsing existing tower structure. To further automate the<br>process of consecutive blocks stacking, we present an alternative approach<br>where the model learns the physics constraint through the interaction with the<br>environment, bypassing the dedicated physics learning as in the former part of<br>this work. In particular, we are interested in the type of tasks that require<br>the agent to reach a given goal state that may be different for every new<br>trial. Thereby we propose a deep reinforcement learning framework that learns<br>policies for stacking tasks which are parametrized by a target structure.<br>
%K Computer Science, Robotics, cs.RO

2018

Conference paper

M. Wagner, H. Basevi, R. Shetty, W. Li, M. Malinowski, M. Fritz, and A. Leonardis

“Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions,” in Computer Vision - ECCV 2018 Workshops, Munich, Germany, 2019.

@inproceedings{wagner18eccvw,
TITLE = {Answering Visual What-If Questions: {F}rom Actions to Predicted Scene Descriptions},
AUTHOR = {Wagner, Misha and Basevi, Hector and Shetty, Rakshith and Li, Wenbin and Malinowski, Mateusz and Fritz, Mario and Leonardis, Ales},
LANGUAGE = {eng},
ISBN = {978-3-030-11008-6},
DOI = {10.1007/978-3-030-11009-3_32},
PUBLISHER = {Springer},
YEAR = {2018},
DATE = {2019},
BOOKTITLE = {Computer Vision -- ECCV 2018 Workshops},
EDITOR = {Leal-Taix{\'e}, Laura and Roth, Stefan},
PAGES = {521--537},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {11129},
ADDRESS = {Munich, Germany},
}

Endnote

%0 Conference Proceedings
%A Wagner, Misha
%A Basevi, Hector
%A Shetty, Rakshith
%A Li, Wenbin
%A Malinowski, Mateusz
%A Fritz, Mario
%A Leonardis, Ales
%+ External Organizations
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
%T Answering Visual What-If Questions: From Actions to Predicted Scene
  Descriptions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-B962-F
%R 10.1007/978-3-030-11009-3_32
%D 2019
%B Workshop on Visual Learning and Embodied Agents in Simulation Environments
%Z date of event: 2018-09-09 - 2018-09-09
%C Munich, Germany
%B Computer Vision - ECCV 2018 Workshops
%E Leal-Taix&#233;, Laura; Roth, Stefan
%P 521 - 537
%I Springer
%@ 978-3-030-11008-6
%B Lecture Notes in Computer Science
%N 11129

Thesis

D2IMPR-CS

W. Li

“From Perception over Anticipation to Manipulation,” Universität des Saarlandes, Saarbrücken, 2018.

Abstract

From autonomous driving cars to surgical robots, robotic system has enjoyed significant growth over the past decade. With the rapid development in robotics alongside the evolution in the related fields, such as computer vision and machine learning, integrating perception, anticipation and manipulation is key to the success of future robotic system. In this thesis, we explore different ways of such integration to extend the capabilities of a robotic system to take on more challenging real world tasks. On anticipation and perception, we address the recognition of ongoing activity from videos. In particular we focus on long-duration and complex activities and hence propose a new challenging dataset to facilitate the work. We introduce hierarchical labels over the activity classes and investigate the temporal accuracy-specificity trade-offs. We propose a new method based on recurrent neural networks that learns to predict over this hierarchy and realize accuracy specificity trade-offs. Our method outperforms several baselines on this new challenge. On manipulation with perception, we propose an efficient framework for programming a robot to use human tools. We first present a novel and compact model for using tools described by a tip model. Then we explore a strategy of utilizing a dual-gripper approach for manipulating tools – motivated by the absence of dexterous hands on widely available general purpose robots. Afterwards, we embed the tool use learning into a hierarchical architecture and evaluate it on a Baxter research robot. Finally, combining perception, anticipation and manipulation, we focus on a block stacking task. First we explore how to guide robot to place a single block into the scene without collapsing the existing structure. We introduce a mechanism to predict physical stability directly from visual input and evaluate it first on a synthetic data and then on real-world block stacking. Further, we introduce the target stacking task where the agent stacks blocks to reproduce a tower shown in an image. To do so, we create a synthetic block stacking environment with physics simulation in which the agent can learn block stacking end-to-end through trial and error, bypassing to explicitly model the corresponding physics knowledge. We propose a goal-parametrized GDQN model to plan with respect to the specific goal. We validate the model on both a navigation task in a classic gridworld environment and the block stacking task.

BibTeX

@phdthesis{Wenbinphd2018,
TITLE = {From Perception over Anticipation to Manipulation},
AUTHOR = {Li, Wenbin},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-ds-271561},
DOI = {10.22028/D291-27156},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2018},
DATE = {2018},
ABSTRACT = {From autonomous driving cars to surgical robots, robotic system has enjoyed significant growth over the past decade. With the rapid development in robotics alongside the evolution in the related fields, such as computer vision and machine learning, integrating perception, anticipation and manipulation is key to the success of future robotic system. In this thesis, we explore different ways of such integration to extend the capabilities of a robotic system to take on more challenging real world tasks. On anticipation and perception, we address the recognition of ongoing activity from videos. In particular we focus on long-duration and complex activities and hence propose a new challenging dataset to facilitate the work. We introduce hierarchical labels over the activity classes and investigate the temporal accuracy-specificity trade-offs. We propose a new method based on recurrent neural networks that learns to predict over this hierarchy and realize accuracy specificity trade-offs. Our method outperforms several baselines on this new challenge. On manipulation with perception, we propose an efficient framework for programming a robot to use human tools. We first present a novel and compact model for using tools described by a tip model. Then we explore a strategy of utilizing a dual-gripper approach for manipulating tools -- motivated by the absence of dexterous hands on widely available general purpose robots. Afterwards, we embed the tool use learning into a hierarchical architecture and evaluate it on a Baxter research robot. Finally, combining perception, anticipation and manipulation, we focus on a block stacking task. First we explore how to guide robot to place a single block into the scene without collapsing the existing structure. We introduce a mechanism to predict physical stability directly from visual input and evaluate it first on a synthetic data and then on real-world block stacking. Further, we introduce the target stacking task where the agent stacks blocks to reproduce a tower shown in an image. To do so, we create a synthetic block stacking environment with physics simulation in which the agent can learn block stacking end-to-end through trial and error, bypassing to explicitly model the corresponding physics knowledge. We propose a goal-parametrized GDQN model to plan with respect to the specific goal. We validate the model on both a navigation task in a classic gridworld environment and the block stacking task.},
}

Endnote

%0 Thesis
%A Li, Wenbin
%Y Fritz, Mario
%A referee: Leonardis, Ale&#353;
%A referee: Slussalek, Philip
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T From Perception over Anticipation to Manipulation :
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-4193-F
%R 10.22028/D291-27156
%U urn:nbn:de:bsz:291-scidok-ds-271561
%F OTHER: hdl:20.500.11880/27026
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2018
%P 165 p.
%V phd
%9 phd
%X From autonomous driving cars to surgical robots, robotic system has enjoyed significant growth over the past decade. With the rapid development in robotics alongside the evolution in the related fields, such as computer vision and machine learning, integrating perception, anticipation and manipulation is key to the success of future robotic system. In this thesis, we explore different ways of such integration to extend the capabilities of a robotic system to take on more challenging real world tasks. On anticipation and perception, we address the recognition of ongoing activity from videos. In particular we focus on long-duration and complex activities and hence propose a new challenging dataset to facilitate the work. We introduce hierarchical labels over the activity classes and investigate the temporal accuracy-specificity trade-offs. We propose a new method based on recurrent neural networks that learns to predict over this hierarchy and realize accuracy specificity trade-offs. Our method outperforms several baselines on this new challenge. On manipulation with perception, we propose an efficient framework for programming a robot to use human tools. We first present a novel and compact model for using tools described by a tip model. Then we explore a strategy of utilizing a dual-gripper approach for manipulating tools &#8211; motivated by the absence of dexterous hands on widely available general purpose robots. Afterwards, we embed the tool use learning into a hierarchical architecture and evaluate it on a Baxter research robot. Finally, combining perception, anticipation and manipulation, we focus on a block stacking task. First we explore how to guide robot to place a single block into the scene without collapsing the existing structure. We introduce a mechanism to predict physical stability directly from visual input and evaluate it first on a synthetic data and then on real-world block stacking. Further, we introduce the target stacking task where the agent stacks blocks to reproduce a tower shown in an image. To do so, we create a synthetic block stacking environment with physics simulation in which the agent can learn block stacking end-to-end through trial and error, bypassing to explicitly model the corresponding physics knowledge. We propose a goal-parametrized GDQN model to plan with respect to the specific goal. We validate the model on both a navigation task in a classic gridworld environment and the block stacking task.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27026

2017

Conference paper

W. Li, A. Leonardis, and M. Fritz

“Visual Stability Prediction and Its Application to Manipulation,” in AAAI 2017 Spring Symposia 05, Interactive Multisensory Object Perception for Embodied Agents, Palo Alto, CA, 2017.

@inproceedings{li16aaai,
TITLE = {Visual Stability Prediction and Its Application to Manipulation},
AUTHOR = {Li, Wenbin and Leonardis, Ale{\v s} and Fritz, Mario},
LANGUAGE = {eng},
ISBN = {978-1-57735-754-4},
PUBLISHER = {AAAI Press},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {AAAI 2017 Spring Symposia 05, Interactive Multisensory Object Perception for Embodied Agents},
SERIES = {Technical Report},
VOLUME = {SS-17},
ADDRESS = {Palo Alto, CA},
}

Endnote

%0 Conference Proceedings
%A Li, Wenbin
%A Leonardis, Ale&#353;
%A Fritz, Mario
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
External Organizations
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Visual Stability Prediction and Its Application to Manipulation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-972C-B
%D 2017
%B AAAI Spring Symposia 05, Interactive Multisensory Object Perception for Embodied Agents
%Z date of event: 2017-03-27 - 2017-03-29
%C Palo Alto, CA
%B AAAI 2017 Spring Symposia 05, Interactive Multisensory Object Perception for Embodied Agents
%I AAAI Press
%@  978-1-57735-754-4
%B Technical Report
%N SS-17
%U http://www.cs.utexas.edu/~jsinapov/AAAI-SSS-2017/paper/Li_AAAI_SSS_2017.pdf

Conference paper

W. Li, A. Leonardis, and M. Fritz

“Visual Stability Prediction for Robotic Manipulation,” in IEEE International Conference on Robotics and Automation (ICRA 2017), Singapore, 2017.

@inproceedings{li17icra,
TITLE = {Visual Stability Prediction for Robotic Manipulation},
AUTHOR = {Li, Wenbin and Leonardis, Ales and Fritz, Mario},
LANGUAGE = {eng},
ISBN = {978-1-5090-4634-8},
DOI = {10.1109/ICRA.2017.7989304},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {IEEE International Conference on Robotics and Automation (ICRA 2017)},
PAGES = {2606--2613},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Li, Wenbin
%A Leonardis, Ales
%A Fritz, Mario
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
External Organizations
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Visual Stability Prediction for Robotic Manipulation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4A3A-1
%R 10.1109/ICRA.2017.7989304
%D 2017
%B IEEE International Conference on Robotics and Automation
%Z date of event: 2017-05-29 - 2017-06-03
%C Singapore
%B IEEE International Conference on Robotics and Automation
%P 2606 - 2613
%I IEEE
%@ 978-1-5090-4634-8

Paper

W. Li, J. Bohg, and M. Fritz

“Acquiring Target Stacking Skills by Goal-Parameterized Deep Reinforcement Learning,” 2017. [Online]. Available: http://arxiv.org/abs/1711.00267.

Abstract

Understanding physical phenomena is a key component of human intelligence and

enables physical interaction with previously unseen environments. In this

paper, we study how an artificial agent can autonomously acquire this intuition

through interaction with the environment. We created a synthetic block stacking

environment with physics simulation in which the agent can learn a policy

end-to-end through trial and error. Thereby, we bypass to explicitly model

physical knowledge within the policy. We are specifically interested in tasks

that require the agent to reach a given goal state that may be different for

every new trial. To this end, we propose a deep reinforcement learning

framework that learns policies which are parametrized by a goal. We validated

the model on a toy example navigating in a grid world with different target

positions and in a block stacking task with different target structures of the

final tower. In contrast to prior work, our policies show better generalization

across different goals.

BibTeX

@online{Li1711.00267,
TITLE = {Acquiring Target Stacking Skills by Goal-Parameterized Deep Reinforcement Learning},
AUTHOR = {Li, Wenbin and Bohg, Jeannette and Fritz, Mario},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1711.00267},
EPRINT = {1711.00267},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Understanding physical phenomena is a key component of human intelligence and enables physical interaction with previously unseen environments. In this paper, we study how an artificial agent can autonomously acquire this intuition through interaction with the environment. We created a synthetic block stacking environment with physics simulation in which the agent can learn a policy end-to-end through trial and error. Thereby, we bypass to explicitly model physical knowledge within the policy. We are specifically interested in tasks that require the agent to reach a given goal state that may be different for every new trial. To this end, we propose a deep reinforcement learning framework that learns policies which are parametrized by a goal. We validated the model on a toy example navigating in a grid world with different target positions and in a block stacking task with different target structures of the final tower. In contrast to prior work, our policies show better generalization across different goals.},
}

Endnote

%0 Report
%A Li, Wenbin
%A Bohg, Jeannette
%A Fritz, Mario
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
External Organizations
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Acquiring Target Stacking Skills by Goal-Parameterized Deep
  Reinforcement Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-4345-7
%U http://arxiv.org/abs/1711.00267
%D 2017
%X   Understanding physical phenomena is a key component of human intelligence and
enables physical interaction with previously unseen environments. In this
paper, we study how an artificial agent can autonomously acquire this intuition
through interaction with the environment. We created a synthetic block stacking
environment with physics simulation in which the agent can learn a policy
end-to-end through trial and error. Thereby, we bypass to explicitly model
physical knowledge within the policy. We are specifically interested in tasks
that require the agent to reach a given goal state that may be different for
every new trial. To this end, we propose a deep reinforcement learning
framework that learns policies which are parametrized by a goal. We validated
the model on a toy example navigating in a grid world with different target
positions and in a block stacking task with different target structures of the
final tower. In contrast to prior work, our policies show better generalization
across different goals.

%K Computer Science, Robotics, cs.RO,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Learning, cs.LG

2016

Conference paper

W. Li and M. Fritz

“Recognition of Ongoing Complex Activities by Sequence Prediction Over a Hierarchical Label Space,” in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV 2016), Lake Placid, NY, USA, 2016.

@inproceedings{li16wacv,
TITLE = {Recognition of Ongoing Complex Activities by Sequence Prediction Over a Hierarchical Label Space},
AUTHOR = {Li, Wenbin and Fritz, Mario},
LANGUAGE = {eng},
ISBN = {978-1-5090-0641-0},
DOI = {10.1109/WACV.2016.7477586},
PUBLISHER = {IEEE},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {2016 IEEE Winter Conference on Applications of Computer Vision (WACV 2016)},
ADDRESS = {Lake Placid, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Li, Wenbin
%A Fritz, Mario
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Recognition of Ongoing Complex Activities by Sequence Prediction Over a Hierarchical Label Space : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-D2F7-D
%R 10.1109/WACV.2016.7477586
%D 2016
%B 2016 IEEE Winter Conference on Applications of Computer Vision
%Z date of event: 2016-03-07 - 2016-03-09
%C Lake Placid, NY, USA
%B 2016 IEEE Winter Conference on Applications of Computer Vision
%I IEEE
%@ 978-1-5090-0641-0

Paper

W. Li, S. Azimi, A. Leonardis, and M. Fritz

“To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction,” 2016. [Online]. Available: http://arxiv.org/abs/1604.00066.

Abstract

Understanding physical phenomena is a key competence that enables humans and

animals to act and interact under uncertain perception in previously unseen

environments containing novel object and their configurations. Developmental

psychology has shown that such skills are acquired by infants from observations

at a very early stage.

In this paper, we contrast a more traditional approach of taking a

model-based route with explicit 3D representations and physical simulation by

an end-to-end approach that directly predicts stability and related quantities

from appearance. We ask the question if and to what extent and quality such a

skill can directly be acquired in a data-driven way bypassing the need for an

explicit simulation.

We present a learning-based approach based on simulated data that predicts

stability of towers comprised of wooden blocks under different conditions and

quantities related to the potential fall of the towers. The evaluation is

carried out on synthetic data and compared to human judgments on the same

stimuli.

BibTeX

@online{Li_arXiv2016,
TITLE = {To Fall Or Not To Fall: {A} Visual Approach to Physical Stability Prediction},
AUTHOR = {Li, Wenbin and Azimi, Seyedmajid and Leonardis, Ale{\v s} and Fritz, Mario},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1604.00066},
EPRINT = {1604.00066},
EPRINTTYPE = {arXiv},
YEAR = {2016},
ABSTRACT = {Understanding physical phenomena is a key competence that enables humans and animals to act and interact under uncertain perception in previously unseen environments containing novel object and their configurations. Developmental psychology has shown that such skills are acquired by infants from observations at a very early stage. In this paper, we contrast a more traditional approach of taking a model-based route with explicit 3D representations and physical simulation by an end-to-end approach that directly predicts stability and related quantities from appearance. We ask the question if and to what extent and quality such a skill can directly be acquired in a data-driven way bypassing the need for an explicit simulation. We present a learning-based approach based on simulated data that predicts stability of towers comprised of wooden blocks under different conditions and quantities related to the potential fall of the towers. The evaluation is carried out on synthetic data and compared to human judgments on the same stimuli.},
}

Endnote

%0 Report
%A Li, Wenbin
%A Azimi, Seyedmajid
%A Leonardis, Ale&#353;
%A Fritz, Mario
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
External Organizations
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T To Fall Or Not To Fall: A Visual Approach to Physical Stability
  Prediction : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-0634-6
%U http://arxiv.org/abs/1604.00066
%D 2016
%X   Understanding physical phenomena is a key competence that enables humans and
animals to act and interact under uncertain perception in previously unseen
environments containing novel object and their configurations. Developmental
psychology has shown that such skills are acquired by infants from observations
at a very early stage.
  In this paper, we contrast a more traditional approach of taking a
model-based route with explicit 3D representations and physical simulation by
an end-to-end approach that directly predicts stability and related quantities
from appearance. We ask the question if and to what extent and quality such a
skill can directly be acquired in a data-driven way bypassing the need for an
explicit simulation.
  We present a learning-based approach based on simulated data that predicts
stability of towers comprised of wooden blocks under different conditions and
quantities related to the potential fall of the towers. The evaluation is
carried out on synthetic data and compared to human judgments on the same
stimuli.

%K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Robotics, cs.RO

2015

Conference paper

W. Li and M. Fritz

“Teaching Robots the Use of Human Tools from Demonstration with Non-dexterous End-effectors,” in 2015 IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS 2015), Seoul, South Korea, 2015.

@inproceedings{li15humanoids,
TITLE = {Teaching Robots the Use of Human Tools from Demonstration with Non-dexterous End-effectors},
AUTHOR = {Li, Wenbin and Fritz, Mario},
LANGUAGE = {eng},
ISBN = {978-1-4799-6885-5},
DOI = {10.1109/HUMANOIDS.2015.7363586},
PUBLISHER = {IEEE},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {2015 IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS 2015)},
PAGES = {547--553},
ADDRESS = {Seoul, South Korea},
}

Endnote

%0 Conference Proceedings
%A Li, Wenbin
%A Fritz, Mario
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Teaching Robots the Use of Human Tools from Demonstration with
Non-dexterous End-effectors : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-6083-8
%R 10.1109/HUMANOIDS.2015.7363586
%D 2015
%B 15th International Conference on Humanoid Robots
%Z date of event: 2015-11-03 - 2015-11-05
%C Seoul, South Korea
%B 2015 IEEE-RAS International Conference on Humanoid Robots
%P 547 - 553
%I IEEE
%@ 978-1-4799-6885-5

2014

Conference paper

W. Li

“Learning Multi-scale Representations for Material Classification,” in Pattern Recognition (GCPR 2014), Münster, Germany, 2014.

@inproceedings{LiGCPR2014,
TITLE = {Learning Multi-scale Representations for Material Classification},
AUTHOR = {Li, Wenbin},
LANGUAGE = {eng},
ISBN = {978-3-319-11751-5},
DOI = {10.1007/978-3-319-11752-2_65},
PUBLISHER = {Springer},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {Pattern Recognition (GCPR 2014)},
EDITOR = {Jiang, Xiaoyi and Hornegger, Joachim and Koch, Reinhard},
PAGES = {757--764},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {8753},
ADDRESS = {M{\"u}nster, Germany},
}

Endnote

%0 Conference Proceedings
%A Li, Wenbin
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Learning Multi-scale Representations for Material Classification : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-47C9-B
%R 10.1007/978-3-319-11752-2_65
%D 2014
%B 6th German Conference on Pattern Recognition
%Z date of event: 2014-09-02 - 2014-09-05
%C M&#252;nster, Germany
%B Pattern Recognition
%E Jiang, Xiaoyi; Hornegger, Joachim; Koch, Reinhard
%P 757 - 764
%I Springer
%@ 978-3-319-11751-5
%B Lecture Notes in Computer Science
%N 8753

Paper

W. Li and M. Fritz

“Learning Multi-scale Representations for Material Classification,” 2014. [Online]. Available: http://arxiv.org/abs/1408.2938.

Abstract

The recent progress in sparse coding and deep learning has made unsupervised

feature learning methods a strong competitor to hand-crafted descriptors. In

computer vision, success stories of learned features have been predominantly

reported for object recognition tasks. In this paper, we investigate if and how

feature learning can be used for material recognition. We propose two

strategies to incorporate scale information into the learning procedure

resulting in a novel multi-scale coding procedure. Our results show that our

learned features for material recognition outperform hand-crafted descriptors

on the FMD and the KTH-TIPS2 material classification benchmarks.

BibTeX

@online{li14multiscale,
TITLE = {Learning Multi-scale Representations for Material Classification},
AUTHOR = {Li, Wenbin and Fritz, Mario},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1408.2938},
EPRINT = {1408.2938},
EPRINTTYPE = {arXiv},
YEAR = {2014},
ABSTRACT = {The recent progress in sparse coding and deep learning has made unsupervised feature learning methods a strong competitor to hand-crafted descriptors. In computer vision, success stories of learned features have been predominantly reported for object recognition tasks. In this paper, we investigate if and how feature learning can be used for material recognition. We propose two strategies to incorporate scale information into the learning procedure resulting in a novel multi-scale coding procedure. Our results show that our learned features for material recognition outperform hand-crafted descriptors on the FMD and the KTH-TIPS2 material classification benchmarks.},
}

Endnote

%0 Report
%A Li, Wenbin
%A Fritz, Mario
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Learning Multi-scale Representations for Material Classification : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-3527-8
%U http://arxiv.org/abs/1408.2938
%D 2014
%8 13.08.2014
%X   The recent progress in sparse coding and deep learning has made unsupervised
feature learning methods a strong competitor to hand-crafted descriptors. In
computer vision, success stories of learned features have been predominantly
reported for object recognition tasks. In this paper, we investigate if and how
feature learning can be used for material recognition. We propose two
strategies to incorporate scale information into the learning procedure
resulting in a novel multi-scale coding procedure. Our results show that our
learned features for material recognition outperform hand-crafted descriptors
on the FMD and the KTH-TIPS2 material classification benchmarks.

%K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Learning, cs.LG,Computer Science, Neural and Evolutionary Computing, cs.NE

2012

Conference paper

W. Li and M. Fritz

“Recognizing Materials from Virtual Examples,” in Computer Vision - ECCV 2012, Florence, Italy, 2012.

@inproceedings{li12eccv,
TITLE = {Recognizing Materials from Virtual Examples},
AUTHOR = {Li, Wenbin and Fritz, Mario},
LANGUAGE = {eng},
ISSN = {0302-9743},
ISBN = {978-3-642-33765-9; 978-3-642-33764-2},
DOI = {10.1007/978-3-642-33765-9_25},
LOCALID = {Local-ID: 0C3C506B688381C6C1257AC70040C334-li12eccv},
PUBLISHER = {Springer},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Computer Vision -- ECCV 2012},
EDITOR = {Fitzgibbon, Andrew and Lazebnik, Svetlana and Perona, Pietro and Sato, Yoichi and Schmid, Cardelia},
PAGES = {345--358},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {7575},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Li, Wenbin
%A Fritz, Mario
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Recognizing Materials from Virtual Examples : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0013-F794-D
%R 10.1007/978-3-642-33765-9_25
%F OTHER: Local-ID: 0C3C506B688381C6C1257AC70040C334-li12eccv
%D 2012
%B European Conference on Computer Vision
%Z date of event: 2012-10-07 - 2012-10-13
%C Florence, Italy
%B Computer Vision - ECCV 2012
%E Fitzgibbon, Andrew; Lazebnik, Svetlana; Perona, Pietro; Sato, Yoichi; Schmid, Cardelia
%P 345 - 358
%I Springer
%@ 978-3-642-33765-9 978-3-642-33764-2
%B Lecture Notes in Computer Science
%N 7575
%@ false