Advanced Video Processing

Christian Theobalt

Advanced Video Processing

Image processing methods are important tools during the post-processing of photographs. Many standard tools for image processing provide algorithms to fulfil typical post-processing tasks, such as standard filters for noise removal or contrast enhancement.

Post-processing of videos is a much more challenging task. Videos are not merely a temporal sequence of individual images. Modifications to videos during post-processing, therefore, need not only be spatially consistent, that is, consistent within a single image, they also need to be consistent over time, that is, over many subsequent frames. Many commercially available video editing tools, however, are based on exactly this assumption that videos can be treated like sequences of individual images. Post-Processing operations with such tools are consequently limited to the application of simple filters
over a temporal window.

Many post-processing tasks in professional video and movie productions are much more complex: Entire scene elements, such as particular actors or supporting crew, may need to be removed or manipulated in a shot, or their positions in a shot may need to be changed. No existing video fi ltering approach would even come close to solving such a task automatically. The consequence is that such editing tasks are usually undertaken through time-consuming manual editing of individual pixels, which can easily take several weeks, even for short videos.

Further, today’s personal and online video collections are extremely large. Browsing such collections is a challenge, as current approaches are often based on unreliable user-based text annotation. Ideally, one would want more semantically-based ways of exploration that exploit space-time and content relations between videos, such as the fact that videos were taken in the same environment, show a similar scene, were taken at a similar time, or deal with a similar topic. The precondition for this would be the ability to automatically extract such relations, which is a highly challenging and unresolved problem. We investigate the algorithmic foundations of both of the above problem areas.

Context-based exploration of video databases

Context-based relations between videos enable particularly interesting new exploration modes for video databases. Context relations capture, for instance, whether videos have been recorded in the same area, the same town, or even partly show the exact same location. We therefore develop new methods to extract such context relations automatically from video collections. One of our approaches computes a graph, a so-called Videoscape, whose nodes are portals, i.e., specific locations that videos can show, and whose edges are videos. A video is linked to a portal node if it ever shows the location represented by that portal.

The automatic computation of this graph is a complex task, and we have developed new machine vision and machine learning algorithms to serve this purpose. The Videoscape can be explored interactively, for instance, one can take a tour through a city by watching videos recorded in that city. Once a frame showing a portal comes up, one can transition into another video based on a 3D reconstruction of the portal location from the video frames. We have also researched how one can overlay videos of a location with a panoramic context of that location [ see figure]. This enables entirely new ways of visualizing the space-time relations between videos filmed at that location.

New video editing approaches

We developed new algorithms to compute dense correspondences between time series of images, even images exhibiting strong noise and images taken under starkly varying camera settings. These methods are the foundation for new advanced video editing operations, such as automatic dynamic background inpainting. They also enable the automatic reconstruction of high dynamic range images from a stack of images taken with different exposure times, even if there was strong motion in the scene between images.

New machine learning algorithms we have developed also enable us to remove compression artifacts from images and videos, and to algorithmically increase the frame resolution in both cases.

About the author:

Christian Theobalt is a Professor of Compter Science and the head of the research group "Graphics, Vision, & Video" at the Max Planck Institute for Informatics. From 2007 until 2009 he was a Visiting Assistant Professor in the Department of Computer Science at Stanford University. He received his MSc degree in Artificial Intelligence from the University of Edinburgh, his Diplom (MS) degree in Computer Science from Saarland University, and his PhD in Computer Science from MPI Informatik.

Most of his research deals with algorithmic problems that lie on the boundary between the fields of Computer Vision and Computer Graphics, such as dynamic 3D scene reconstruction and marker-less motion capture, computer animation, appearance and reflectance modeling, machine learning for graphics and vision, new sensors for 3D acquisition, advanced video processing, as well as image- and physically-based rendering.

For his work, he received several awards including the Otto Hahn Medal of the Max Planck Society in 2007, the EUROGRAPHICS Young Researcher Award in 2009, and the German Pattern Recognition Award 2012. He is also a Principal Investigator and a member of the Steering Committee of the Intel Visual Computing Institute in Saarbrücken.

contact: theobalt (at) mpi-inf.mpg.de
www.mpi-inf.mpg.de/recouces/perfcap/