People Detection, Pose Estimation and Tracking


How Far are We from Solving Pedestrian Detection?

We investigate the gap between current state-of-the-art methods and the "perfect single frame detector". We enable our analysis by creating a human baseline, and by manually clustering the recurrent errors of a top detector. Our results characterize both localization and background-versus-foreground errors. To address localization errors we study the impact of training annotation noise on the detector performance, and show that we can improve even with a small portion of sanitized training data. To address background/foreground discrimination, we study convnets, and discuss which factors affect their performance. Other than our in-depth analysis, we report top performance on the Caltech dataset, and provide a new sanitized set of training and test annotations.


Person Recognition in Personal Photo Collections

We propose a person recognition system built on convnets. By combining information from different body regions, we are able to recognise people in personal photo albums in the presence of occluded faces and appearance changes. The method is simple and is built on open source and open data, yet it improves the state of the art results on a large dataset of social media photos (PIPA). Our new evaluation protocols further enable an in-depth study of the problem.


Taking a Deeper Look at Pedestrians

We propose a powerful 2D human pose estimation model which includes efficient image conditioned model incorporating higher order part dependencies, strong appearance representations and a flexible body model. Our approach significantly outperforms state of the art on "LSP" and "Image Parse" datasets.


Poselet Conditioned Pictorial Structures

We propose a powerful 2D human pose estimation model which includes efficient image conditioned model incorporating higher order part dependencies, strong appearance representations and a flexible body model. Our approach significantly outperforms state of the art on "LSP" and "Image Parse" datasets.


Multi-view Pictorial Structures for 3D Human Pose Estimation

We propose a multi-view pictorial structures model that builds on recent advances in 2D pose estimation and incorporates evidence across multiple viewpoints to allow for robust 3D pose estimation. We evaluate our multi-view pictorial structures approach on the HumanEva-I and MPII Cooking datasets. In comparison to related work for 3D pose estimation our approach achieves similar or better results while operating on single-frames only and not relying on activity specific motion models or tracking. Notably, our approach outperforms state-of-the-art for activities with more complex motions.


Learning People Detectors for Tracking in Crowded Scenes

In this project we consider the problem of detection and tracking of multiple people in crowded street scenes.


Articulated People Detection and Pose Estimation

In this project we develop a new technique to extend an existing training set that allows to explicitly control pose and shape variations. For this we build on recent advances in computer graphics to generate samples with realistic appearance and background while modifying body shape and pose. We validate the effectiveness of our approach on the task of articulated human detection and articulated pose estimation.


Leveraging 3D Body Model for Training Data Generation

This work leverages 3D human shape model from compute graphics to ease training data generation. We investigate different data generation methods allowing direct manipulation of training data distribution and evaluate on the task of people detection.


People Detection and Tracking

In this project we focus on the development of approaches to detection, tracking and pose estimation of people in the complex real-world scenes.


Multi-Cue Onboard Pedestrian Detection

In this work we investigated various features and classifier for onboard pedestrian detection. Best results often were achieved when static image features were combined with motion information derived from a optic flow field.