MPII Movie Description dataset

To foster the research on automatic video description we propose a new MPII Movie Description dataset [1], featuring movie snippets aligned to scripts and DVS (Descriptive video service). DVS is a linguistic description that allows visually impaired people to follow a movie. We benchmark state-of-the-art computer vision algorithms to recognize scenes, human activities, and participating objects and achieve encouraging results in video description on this new challenging dataset. Our most recent results on the dataset can be found in [2].

Request access to the MPII Movie Description dataset

New! Large Scale Movie Description Challenge (LSMDC)

See our more recent dataset for the "Large Scale Movie Description and Understanding Challenge" here. When using this data, please cite [3].

New! Dataset for Grounded and Co-Referenced Characters

See our new Dataset for Grounded and Co-Referenced Characters, introduced in our CVPR 2017 work "Generating Descriptions with Grounded and Co-Referenced People" [4]. To access the data, please, use the username/password obtained here.

References

[1]  A Dataset for Movie Description, Anna Rohrbach, Marcus Rohrbach, Niket Tandon, Bernt Schiele, CVPR 2015

@inproceedings{rohrbach15cvpr,
 title={A Dataset for Movie Description},
 author={Rohrbach, Anna and Rohrbach, Marcus and Tandon, Niket and Schiele, Bernt},
 booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
 year={2015}}

[2]  The Long-Short Story of Movie Description, Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, GCPR 2015

@inproceedings{rohrbach15gcpr,
 title={The Long-Short Story of Movie Description},
 author={Rohrbach, Anna and Rohrbach, Marcus and Schiele, Bernt},
 booktitle={German Conference on Pattern Recognition (GCPR)},
 year={2015}}

[3]  Movie Description, Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, Niket Tandon, Chris Pal, Hugo Larochelle, Aaron Courville, Bernt Schiele, IJCV 2017

@article{lsmdc,
title={Movie Description},
author = {Rohrbach, Anna and Torabi, Atousa and Rohrbach, Marcus and Tandon, Niket and Pal, Chris and Larochelle, Hugo and Courville, Aaron and Schiele, Bernt},
journal={International Journal of Computer Vision},
year = {2017},
url = {http://resources.mpi-inf.mpg.de/publications/D1/2016/2310198.pdf}}

[4]  Generating Descriptions with Grounded and Co-Referenced People, A. Rohrbach, M. Rohrbach, S. Tang, S. J. Oh and B. Schiele, CVPR 2017

@inproceedings{RohrbachCVPR2017,
TITLE = {Generating Descriptions with Grounded and Co-Referenced People},
AUTHOR = {Rohrbach, Anna and Rohrbach, Marcus and Tang, Siyu and Oh, Seong Joon and Schiele, Bernt},
YEAR = {2017},
BOOKTITLE = {30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)},
}