Learning Video Object Segmentation from Static Images

Anna Khoreva*, Federico Perazzi*, Rodrigo Benenson,

Bernt Schiele and Alexander Sorkine-Hornung


Inspired by recent advances of deep learning in instance segmentation and object tracking, we introduce video object segmentation problem as a concept of guided instance segmentation. Our model proceeds on a per-frame basis, guided by the output of the previous frame towards the object of interest in the next frame. We demonstrate that highly accurate object segmentation in videos can be enabled by using a convnet trained with static images only. The key ingredient of our approach is a combination of offline and online learning strategies, where the former serves to produce a refined mask from the previous’ frame estimate and the latter allows to capture the appearance of the specific object instance. Our method can handle different types of input annotations: bounding boxes and segments, as well as incorporate multiple annotated frames, making the system suitable for diverse applications. We obtain competitive results on three different datasets, independently from the type of input annotation.



For each new video frame the network is guided towards the object of interest by feeding in the previous’ frame mask estimate. We therefore refer to our approach as guided instance segmentation.


Evaluation on DAVIS

Overall results of region similarity (J), contour accuracy (F) and temporal (in-)stability (T) for each of the tested algorithms. For rows with an upward pointing arrow higher numbers are better (e.g., mean), and vice versa for rows with downward pointing arrows (e.g., decay, instability). Our approach MSK (MaskTrack+Flow+CRF) achieves a score of 80.3 mIoU.

Qualitative results

Our algorithm is robust to challenging situations such as occlussions, fast motion, multiple instances of the same semantic class, object shape deformation, camera view change and motion blur.






Pre-Computed Masks

DAVIS[MaskTrack]   [MaskTrack+Flow]   [MaskTrack+Flow+CRF]


For further information or data, please contact Anna Khoreva <khoreva at mpi-inf.mpg.de>.


[Khoreva et al., 2016] Learning Video Object Segmentation from Static Images, A. Khoreva, F. Perazzi,  R. Benenson, B. Schiele and A. Sorkine-Hornung, arXiv preprint arXiv:1612.02646, 2016.

title={Learning Video Object Segmentation from Static Images
author={A. Khoreva and F. Perazzi and R. Benenson and B. Schiele and A.Sorkine-Hornung},
journal={arXiv preprint arXiv:1612.02646},