Existing approaches for automatic video description focus on generating single sentences at a single level of detail. We address both of these limitations: for a variable level of detail we produce coherent multi-sentence descriptions of complex videos featuring cooking activities. To understand the difference between detailed and short descriptions, we collect and analyze a video description corpus with three levels of detail.


This site hosts the TACoS Multi-Level corpus presented in Coherent Multi-Sentence Video Description with Variable Level of Detail [1].

The data is only to be used for scientific purposes and must not be republished other than by the Max Planck Institute for Informatics. The scientific use includes processing the data and showing it in publications and presentations. When using it please cite [1].

Change log

  • 15/01/18: release of version 1.0


