Existing approaches for automatic video description focus on generating single sentences at a single level of detail. We address both of these limitations: for a variable level of detail we produce coherent multi-sentence descriptions of complex videos featuring cooking activities. To understand the difference between detailed and short descriptions, we collect and analyze a video description corpus with three levels of detail.
This site hosts the TACoS Multi-Level corpus presented in Coherent Multi-Sentence Video Description with Variable Level of Detail .
Please contact us if you have questions
The data is only to be used for scientific purposes and must not be republished other than by the Max Planck Institute for Informatics. The scientific use includes processing the data and showing it in publications and presentations. When using it please cite .
- Corpus (9.3 MB)
Help & Contact
Feel free to subsribe to our mailing list to get updates (firstname.lastname@example.org).
Related datasets of our group
- Saarbrücken Corpus of Textually Annotated Cooking Scenes (short: TACoS) textual descriptions for MPII Cooking Composite Activities on a single level. An earlier work, with only a single level of descriptions, but additional sentence similarity annotations.
- MPII Multi-Kinect Dataset: Multi-view kinect object classification, recorded in the same kitchen.
- 15/01/18: release of version 1.0
 Coherent Multi-Sentence Video Description with Variable Level of Detail, A. Rohrbach, M. Rohrbach, W. Qiu, A. Friedrich, M. Pinkal and B. Schiele, GCPR, (2014)
 Grounding Action Descriptions in Videos, M. Regneri, M. Rohrbach, D. Wetzel, S. Thater, B. Schiele and M. Pinkal, Transactions of the Association for Computational Linguistics (TACL), Volume 1, p.25-36, (2013)
 Translating Video Content to Natural Language Descriptions, M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal and B. Schiele, IEEE International Conference on Computer Vision (ICCV), December, (2013)