Monocular Scene Understanding from Moving Platforms

Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes [1]

Christian Wojek, Stefan Roth, Konrad Schindler, and Bernt Schiele
in European Conference on Computer Vision (ECCV 2010), Part IV, pp. 467-481, September 5-11, 2010, Heraklion, Crete, Greece



Scene understanding has (again) become a focus of computer vision research, leveraging advances in detection, context modeling, and tracking. In this paper, we present a novel probabilistic 3D scene model that encompasses multi-class object detection, object tracking, scene labeling, and 3D geometric relations. This integrated 3D model is able to represent complex interactions like inter-object occlusion, physical exclusion between objects, and geometric context. Inference allows to recover 3D scene context and perform 3D multi-object tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. In particular, we show that a joint scene tracklet model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for two different types of challenging onboard sequences. We first show a substantial improvement to the state-of-the-art in 3D multi-people tracking. Moreover, a similar performance gain is achieved for multi-class 3D tracking of cars and trucks on a new, challenging dataset.



Below you find the data we used for our publication. In case of any questions please contact Christian Wojek by eMail.


MPIVehicleScenes (test data)                      MPIVehicleScenes.tar.gz (564MB)

Training data for object detectors               detector_training.tar.gz (769 MB)

Training data for scene labeling                 scene_labeling_training.tar.gz (39MB)



Data annotations (also used in our CVPR 2011 extension "Monocular 3D Scene Understanding with Explicit Occlusion Reasoning")


ETH-Loewenplatz                                        eth-loewenplatz-extended.idl.gz

ETH-Linthescher                                          eth-linthescher-extended.idl.gz

ETH-PedCross2                                           eth-pedcross2-extended.idl.gz


For further information and questions please contact Christian Wojek or Bernt Schiele.


[1] Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes, C. Wojek, S. Roth, K. Schindler and B. Schiele, European Conference on Computer Vision (ECCV), September, (2010) Best Paper Award Honorable Mention by IGD