Learning Non-Maximum Suppression (NMS)

Jan Hosang, Rodrigo Benenson, and Bernt Schiele

Abstract — Object detectors have hugely profited from moving towards an end-to-end learning paradigm: proposals, features, and the classifier becoming one neural network improved results two-fold on general object detection. One indispensable component is non-maximum suppression (NMS), a post-processing algorithm responsible for merging all detections that belong to the same object. The de facto standard NMS algorithm is still fully hand-crafted, suspiciously simple, and — being based on greedy clustering with a fixed distance threshold — forces a trade-off between recall and precision. We propose a new network architecture designed to perform NMS, using only boxes and their score. We report experiments for person detection on PETS and for general object categories on the COCO dataset. Our approach shows promise providing improved localization and occlusion handling.

Paper on arXiv

Citation:

@INPROCEEDINGS{Hosang2017cvpr,
  author = {J. Hosang and R. Benenson and B. Schiele},
  title = {Learning non-maximum suppression},
  booktitle = {CVPR},
  year = {2017}
}

Data & Code

Code

COCO dataset

Detections (833MB)

PETs dataset

You can find the original dataset at the PETS 2009 benchmark. With some annotations from Anton Milan and Siyu Tang.

Images (513MB)
Annotations (546KB)
Annotations in COCO-json format (542KB)
Detections (152MB)
Detections taken from: Tang et al., Learning People Detectors for Tracking in Crowded Scenes, ICCV 2013.