D2
Computer Vision and Machine Learning

Jan Hosang (PhD Student)

Personal Information

Research Interests

  • Computer Vision (object recognition and localization)
  • Machine Learning (deep learning)

 

Research Projects

 

For more and more recent information, please visit my personal homepage (or Google Scholar, GitHub, arXiv).

 

Publications

2018
Towards Reaching Human Performance in Pedestrian Detection
S. Zhang, R. Benenson, M. Omran, J. Hosang and B. Schiele
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 40, Number 4, 2018
Abstract
Encouraged by the recent progress in pedestrian detection, we investigate the gap between current state-of-the-art methods and the “perfect single frame detector”. We enable our analysis by creating a human baseline for pedestrian detection (over the Caltech pedestrian dataset). After manually clustering the frequent errors of a top detector, we characterise both localisation and background- versus-foreground errors. To address localisation errors we study the impact of training annotation noise on the detector performance, and show that we can improve results even with a small portion of sanitised training data. To address background/foreground discrimination, we study convnets for pedestrian detection, and discuss which factors affect their performance. Other than our in-depth analysis, we report top performance on the Caltech pedestrian dataset, and provide a new sanitised set of training and test annotations.
2017
Learning Non-maximum Suppression
J. Hosang, R. Benenson and B. Schiele
30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017
Simple Does It: Weakly Supervised Instance and Semantic Segmentation
A. Khoreva, R. Benenson, J. Hosang, M. Hein and B. Schiele
30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017
Analysis and Improvement of the Visual Object Detection Pipeline
J. Hosang
PhD Thesis, Universität des Saarlandes, 2017
Abstract
Visual object detection has seen substantial improvements during the last years due to the possibilities enabled by deep learning. While research on image classification provides continuous progress on how to learn image representations and classifiers jointly, object detection research focuses on identifying how to properly use deep learning technology to effectively localise objects. In this thesis, we analyse and improve different aspects of the commonly used detection pipeline. We analyse ten years of research on pedestrian detection and find that improvement of feature representations was the driving factor. Motivated by this finding, we adapt an end-to-end learned detector architecture from general object detection to pedestrian detection. Our deep network outperforms all previous neural networks for pedestrian detection by a large margin, even without using additional training data. After substantial improvements on pedestrian detection in recent years, we investigate the gap between human performance and state-of-the-art pedestrian detectors. We find that pedestrian detectors still have a long way to go before they reach human performance, and we diagnose failure modes of several top performing detectors, giving direction to future research. As a side-effect we publish new, better localised annotations for the Caltech pedestrian benchmark. We analyse detection proposals as a preprocessing step for object detectors. We establish different metrics and compare a wide range of methods according to these metrics. By examining the relationship between localisation of proposals and final object detection performance, we define and experimentally verify a metric that can be used as a proxy for detector performance. Furthermore, we address a structural weakness of virtually all object detection pipelines: non-maximum suppression. We analyse why it is necessary and what the shortcomings of the most common approach are. To address these problems, we present work to overcome these shortcomings and to replace typical non-maximum suppression with a learnable alternative. The introduced paradigm paves the way to true end-to-end learning of object detectors without any post-processing. In summary, this thesis provides analyses of recent pedestrian detectors and detection proposals, improves pedestrian detection by employing deep neural networks, and presents a viable alternative to traditional non-maximum suppression.
2016
How Far are We from Solving Pedestrian Detection?
S. Zhang, R. Benenson, M. Omran, J. Hosang and B. Schiele
29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 2016
What Makes for Effective Detection Proposals?
J. Hosang, R. Benenson, P. Dollár and B. Schiele
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 38, Number 4, 2016
A Convnet for Non-maximum Suppression
J. Hosang, R. Benenson and B. Schiele
Pattern Recognition (GCPR 2016), 2016
Abstract
Non-maximum suppression (NMS) is used in virtually all state-of-the-art object detection pipelines. While essential object detection ingredients such as features, classifiers, and proposal methods have been extensively researched surprisingly little work has aimed to systematically address NMS. The de-facto standard for NMS is based on greedy clustering with a fixed distance threshold, which forces to trade-off recall versus precision. We propose a convnet designed to perform NMS of a given set of detections. We report experiments on a synthetic setup, and results on crowded pedestrian detection scenes. Our approach overcomes the intrinsic limitations of greedy NMS, obtaining better recall and precision.
2015
Taking a Deeper Look at Pedestrians
J. Hosang, M. Omran, R. Benenson and B. Schiele
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), 2015
GyroPen: Gyroscopes for Pen-Input with Mobile Phones
T. Deselaers, D. Keysers, J. Hosang and H. Rowley
IEEE Transactions on Human-Machine Systems, Volume 45, Number 2, 2015
What Makes for Effective Detection Proposals?
J. Hosang, R. Benenson, P. Dollár and B. Schiele
Technical Report, 2015
(arXiv: 1502.05082)
Abstract
Current top performing object detectors employ detection proposals to guide the search for objects, thereby avoiding exhaustive sliding window search across images. Despite the popularity and widespread use of detection proposals, it is unclear which trade-offs are made when using them during object detection. We provide an in-depth analysis of twelve proposal methods along with four baselines regarding proposal repeatability, ground truth annotation recall on PASCAL and ImageNet, and impact on DPM and R-CNN detection performance. Our analysis shows that for object detection improving proposal localisation accuracy is as important as improving recall. We introduce a novel metric, the average recall (AR), which rewards both high recall and good localisation and correlates surprisingly well with detector performance. Our findings show common strengths and weaknesses of existing methods, and provide insights and metrics for selecting and tuning proposal methods.
2014
Ten Years of Pedestrian Detection, What Have We Learned?
R. Benenson, M. Omran, J. Hosang and B. Schiele
Computer Vision - ECCV 2014 Workshops (ECCV 2014 Workshop CVRSUAD), 2014
How Good are Detection Proposals, really?
J. Hosang, R. Benenson and B. Schiele
Proceedings of the British Machine Vision Conference (BMVC 2014), 2014
Abstract
Current top performing Pascal VOC object detectors employ detection proposals to guide the search for objects thereby avoiding exhaustive sliding window search across images. Despite the popularity of detection proposals, it is unclear which trade‐offs are made when using them during object detection. We provide an in depth analysis of ten object proposal methods along with four baselines regarding ground truth annotation recall (on Pascal VOC 2007 and ImageNet 2013), repeatability, and impact on DPM detector performance. Our findings show common weaknesses of existing methods, and provide insights to choose the most adequate method for different settings.