People Detection and Pose Estimation in Real-World Scenes

Spotlight: Released Friday, 20 January 2012

Mykhaylo Andriluka

Multi Person Tracking

Finding and following people is a key technology for many applications such as robotics and automotive safety, human-computer interaction scenarios, or for indexing images and videos from the web or surveillance cameras. At the same time it is one of the most challenging problems in computer vision and remains a scientific challenge for realistic scenes.

We developed a new approach for detecting people and estimating their poses in complex street scenes with multiple people and dynamic backgrounds. Our approach does not require multiple synchronized video streams but can operate on images from a moving, monocular and un-calibrated camera. Several examples of people detections and estimated body configurations are shown below. The important challenges addressed in our approach are frequent full and partial occlusions of people, cluttered and dynamically changing backgrounds, and ambiguities in recovering 3D body poses from monocular data.

Examples of people detection and pose estimation obtained with our approach

See also these videos:

2D Human Pose Estimation

Several key components contribute to the success of the approach. The first is a novel and generic procedure for people detection and 2D pose estimation that is based on the pictorial structures model and also enables to estimate viewpoints of people from monocular and single images.

Our approach is based on discriminatively learned local appearance models of human body parts and a kinematic model of the human body. The appearance of the body parts is represented by a set of local image descriptors. We employ a boosting classifier trained on the dataset of annotated human poses in order to learn which of these local features are informative for the presence of the body part at a given image location. Interpreting the output of each classifier as a local likelihood we infer the optimal configuration of the body parts using belief propagation.

3D Human Pose Estimation and Multi-Person Tracking

The second key component in our approach enables people tracking and 3D pose estimation. In contrast to prior work we accumulate evidence in multiple stages thereby reducing the ambiguity of 3D pose estimation effectively in every step.
In particular we propose a novel multi-stage inference procedure for 3D pose estimation. Our approach goes beyond prior work in this area, in which pose likelihoods are often based on simple image features such as silhouettes and edge maps and which often assume conditional independence of evidence in adjacent frames of the sequence. In contrast to these approaches, our 3D pose likelihood is formulated in terms of estimates of the 2D body configurations and viewpoints obtained with a strong discriminative appearance model. In addition we refine and improve these estimates by tracking them over time, which allows to detect occlusion events and group hypotheses corresponding to the same person. We demonstrate that the combination of these estimates significantly constrains 3D pose estimation and allows to prune many of the local minima that otherwise hinder successful optimization.

About the author:

Dr. Mykhaylo Andriluka has studied mathematics and computer science at the I.I. Mechnikov National University in Odessa, Ukraine, and at the TU Darmstadt, Germany. He graduated in 2010 with a Ph.D. in computer science from the TU Darmstadt. His doctoral work in the area of computer vision has resulted in several highly cited publications. The approach to human pose estimation proposed in this work has been widely used as foundation for further research in this area. He joined the department of Computer Vision and Multimodal Computing of the Max Planck Institute for Informatics as a postdoctoral researcher in 2011. Prior to joining MPI he has been a visiting researcher at the Disney Research Lab in Pittsburgh.


andriluk (at)