Rodrigo Benenson & Jan Hosang

Ten Years of Pedestrian Detection, What Have we Learned?

Object detection is one of the core problems in computer vision. Pedestrian detection is a canonical case of object detection. Due to its multiple applications in car safety, surveillance, and robotics, pedestrian detection is an active area of research, with 1000+ papers published in the last decade. Pedestrian detection also benefits from well-established benchmarks for evaluating and comparing methods.

In the last decade, numerous techniques have been developed for pedestrian detection. Using different image features, cues, and classifier types, detection performance has been steadily improving. Building larger systems that integrate more ingredients has become a reliable recipe to advance quality (“more is more”).

Our research has focused on understanding which are the core ingredients for high quality pedestrian detection. By improving these core ingredients, we have been able to consistently obtain top performance on established standard benchmarks (“less is more”).

Our top performing detectors are grounded on decade-old ideas (oriented gradients features, filter banks, and boosted decision trees) that we have carefully instantiated and adapted to our newest insights. Contrary to previous work, we show that top performance on pedestrian detection can be reached by using a single rigid template (no components, no deformable parts) applied as a sliding window (no geometry prior, scene prior, or bottom-up cues). Because of its simplicity, our approach is suitable for high-speed implementations. Our current results on pedestrian detection outperform both deep convolutional neural networks and methods that use more complex features (such as local binary patterns and covariance) by a large margin.

In the last ten years, pedestrian detection has shown great progress.

In the last years, our work has repeatedly advanced the top performance reached on standard benchmarks. In the last two years alone, we have improved the state-of-the-art with a 30 x reduction in errors (false positive per image at 80% recall on Caltech pedestrian dataset). Our techniques developed for pedestrian detection have also shown top performance for face and traffic sign detection. For face detection, our results match the best performance reported on all four major benchmark datasets. In the last ten years, pedestrian detection has shown great progress.

Although significant progress has been achieved in the last decade, we are still far from reaching human performance on this task, or from reaching the desired super-human quality for automatic operations. We expect that a shift towards a more global scene understanding will help reduce the mistakes of current methods in the future.

Rodrigo Benenson

DEPT. 2 Computer Vision and Multimodal Computing
Phone +49 681 9325-2105

Jan Hosang

DEPT. 2 Computer Vision and Multimodal Computing
Phone +49 681 9325-2123