Phase I
Phase II
Associated Research Groups
Previous Research Groups
Phase I
|
"3D Video and Vision-based Graphics"
|
Research Mission
The research group "3D Video and Vision-based Graphics" investigates problems that live on the boundary
between the fields Computer Graphics and Computer Vision. One major line of research in Computer Vision
aims at developing methods for acquiring dynamic scenes with video cameras and estimating model descriptions
of the scenes from the recorded data.
These model descriptions typically comprise of models of shape, models of motion or models of physical material
properties. The main goal of Computer Graphics, on the other hand, has been to display such model descriptions
realistically. In our work, we investigate the problems of acquisition, reconstruction and display of dynamic
scenes in conjunction and develop novel algorithmic concepts for each of these questions. In particular we
develop methods for dynamic shape and appearance reconstruction, motion estimation, animation of complex deformable
models, and real-time rendering and relighting. Ultimately, it is our goal to generate realistic renderings
of image- or video-captured dynamic scenes from arbitrary virtual camera views.
One very young line of research that aims at putting this video-based rendering paradigm into practice, is 3D or
Free-Viewpoint Video. Here, we will extend our previous work on model-based free-viewpoint video of human actors
and develop novel algorithms that enable us to process more general scenes.
Top
|
"Methods for Large-Scale Physical Modeling and Animation"
|
Research Mission
General Reduced-Order Models
Dimensionality reduction is a powerful technique that makes very high-dimensional simulations tractable. As an instance of data-driven simulation, creating a reduced-order model requires examples that have to be generated using traditional simulation techniques. Furthermore, the reduction process is computationally expensive, imposing severe limits on the size of reduced models. Each reduced model can only be used with the exact boundary conditions that were specified while the model was created.
In this project, we aim to relax these restrictions. By computing reduced models in a modular fashion, more complex simulations can be set up without repeating the expensive model reduction step. Combining several of these tiles requires a coupling mechanism that has to be fast and flexible, while guaranteeing important physical invariants and providing useful error bounds.
Measuring Fluid Flow
Fluid flows are complex and hard to measure. One of the most commonly used techniques injects probes into the fluid whose position is then tracked over time. The recorded trajectories can then be used to reconstruct the (time-dependent) flow field.
In this project, traditional flow reconstruction methods are augmented using knowledge about the physical processes. Since we can compute the time-dependent behavior of a fluid given an initial condition, measurements at one timestep can be validated by subsequent measurements. This mechanism makes it possible to use far fewer sample points to reconstruct time-varying flow fields.
To validate this method, we design and build an experimental setup that can be used to measure small-scale fluid flows using optical tracking of small objects, similar to traditional particle image velocimetry approaches.
Handling Mobility in Sensor Networks
Most algorithms for routing and data managements are designed for static or slowly changing networks. While they adapt well to gradual changes in network topology and mechanisms exists that help recover from temporary failures, mobility of some nodes poses significant challenges. This is especially true for sensor networks, whose nodes are typically small, low-power devices,placing severe constraints on admissible computation and power consumption.
This project aims to gracefully handle node mobility in networks, in particular sensor networks. Exploiting typical user behaviour can help implement proactive handoff procedures to prevent link failures. Intermediate goals include a robust and efficient streaming mechanism, as well as reliable predictive routing.
Top
Phase II
|
"General Appearance Acquisition"
I completed Phase I in Stanford from October 1, 2003 to March 31, 2006.
|
Research Mission
One central problem in computer graphics is synthesizing realistic images that
are indistinguishable from real photographs. The basic theory behind rendering
such images has been known for a while and has been turned into a broad range of
rendering algorithms ranging from slow but physically accurate frameworks to
hardware-accelerated, real-time applications that make a lot of
simplifications. One fundamental building block to these algorithms is the
simulation of the interaction between incident illumination and the reflective
properties of the scene. The limiting factor in photo-realistic image synthesis
today is not the rendering per se but rather modeling the input to the
algorithms. The realism of the outcome depends largely on the quality of the
scene description passed to the rendering algorithm. Accurate input is required
for geometry, illumination and reflective properties. An efficient way to
obtain realistic models is through measurement of scene attributes from
real-world objects by inverse rendering. The attributes are estimated from real
photographs by inverting the rendering process. The digitization of real word
objects is of increasing importance not only to image synthesis applications,
such as film production or computer games, but also to a number of other
applications, such as e-commerce, education, digital libraries, cultural
heritage, and so forth. In the context of cultural heritage, for example, the
captured 3D models can serve to digitally preserve an artifact, to document and
guide the restoration process, and to present art to a wide audience via the
Internet.
One focus of this research group is on developing photographic techniques for
measuring the scene's reflection properties. Here, the goal is to capture the
appearance of the entire scene including all local and global illumination
effects such as highlights, shadows, interreflections, or caustics, such that
the scene can be reconstructed and relit later in an virtual environment
producing photorealistic images. The envisioned techniques should be general
enough to cope with arbitrary materials, with scenes with high depth complexity
such as trees, and with scenes in arbitrary environments, i.e. outside a
measurement laboratory.
A second thread of research followed by this group is computational photography
with the goal to develop optical systems augmented by computational procedures;
by jointly designing the capturing apparatus, i.e., the optical layout of active
or passive devices such as cameras, projectors, beam-splitters, etc., together
with the capturing algorithm and appropriate post-processing. Such combined
systems could be used to increase image quality, e.g., by removing images noise
or camera shake, over pronouncing or extracting scene feature such as edges or
silhouettes by optical means, to 3D volume reconstruction algorithms from
images. We plan to devise computational photography techniques for advanced
optical microscopy, large scale scene acquisition, and even astronomical imaging.
Top
|
"Visual Exploration of Space-Time Pattern in Multi-Dimensional and Heterogeneous Data Spaces"
|
Research Mission
In many application scenarios data is collected and referenced by its geo-spatial location at a certain point
in time (time-stamp). The analysis of such data sets is an important task, since for decision makers, analysts
or emergency response teams it is often essential to rapidly extract relevant information from the flood of data.
In emergency management, for example, GPS-navigation plays an important role to provide effective coordination
for various organizations to enable more efficient emergency assistant. Therefore, the GPS information from
selected vehicles is collected which results in large and complex databases. A research challenge on the one hand
is to find efficient methods to handle such massive information flows and on the other hand to provide visualizations
that fuse this information together. The aim is to allow analysts to identify space and time patterns and to
provide efficient visual awareness and coordinated help in emergency cases.
In the described examples the collected data typically results in large and heterogeneous data sets with
geo-spatial information, time stamps, pictures, text and other information. The visual analysis of these
massive volumes of data is a challenge for existing visualization techniques. Space-time-patterns can be seen
as a series of multivariate profiles. The difficulty is to provide effective visual awareness in multi-dimensional
heterogeneous data spaces that represents the available recourses or products as well as different kinds of
events and alerts. Effective visual awareness is based on the perception of objects in an environment with a
volume of space and time, the comprehension of their meaning, and the projection of their status in the near future.
In our research we study the following questions about space-time pattern:
- How to adequately analyze changing events in time over a geo-spatial context?
- How to enable intelligent situational awareness?
- How to automatically include heterogeneous data from multiple sources (numerical data, text, images,
gps-data, network logs) into the visual data exploration process?
Top
|
"Integrative Scientific Computing"
Phase I completed in Stanford from August 2005 to August 2007 under the title: "Adaptive PDE Solvers in Graphics
Hardware with Applications to Physical Simulation, Computer Graphics and Computer Vision"
|
Research Mission
Our research focuses on significant improvements of performance and accuracy in scientific computing
through a global optimization across the entire spectrum of continuous modeling, numerical analysis,
algorithm design, software implementation and hardware acceleration.
The concatenation of individually optimal solutions on each of these layers often performs poorly due
to conflicting requirements at the interfaces. Consequently, the integration of individually suboptimal
but inter-coordinated solutions from all layers can be far superior. Even when the application complexity
prevents a global optimization the integrative consideration of several layers already proves to be beneficial.
Chosen application areas of particular interest in this context are the solution of partial differential
equations and real-time image processing.
Top
|
"Image-based 3D Scene Analysis"
|
Research Mission
The research group develops new tools and algorithms for 3D scene
analysis from image sequences or video. This comprises
estimations of camera motion of one or more cameras in a given scene; of
static scene geometry and motion and shape of moving objects; and of
scene illumination. Such tools and algorithms can be used for the
generation of visual effects in movie and TV production. Specialized
versions of these algorithms can also be applied in the field of 3D
scene analysis for autonomous aerial, ground, and water vehicles as well
as industrial, domestic, and medical robots.
The aim is to develop tools for 3D scene analysis that work fully
automatically. Nevertheless, in case of error, the user should be in
control of the estimation process and must be able to guide the
algorithm to the desired solution by simple and intuitive interactive
techniques. Another approach to deal with errors is to automatically
extract high level information from the data generated by the
algorithms. By feeding back this information into the estimation process
it is possible to guide the algorithms to the correct solution.
The research results are published in international journals and
conference proceedings. The group particularly focuses on tools and
algorithms that have the potential to be turned into mass market
applications and are therefore attractive for commercial partners.
Top
Associated Research Groups
|
"Geometry-Guided Design and Analysis of Wireless Sensor Networks"
I completed Phase I in Stanford from August 1, 2004 to July 31, 2006.
|
Research Mission
Mobile applications, such as cellular phone service, digital imaging, and portable computing are driving the need for
high-bandwidth wireless communication networks. Wireless communication introduces new (spatial) constraints on the
design and operation of communication networks. For example, the power required to transmit information via radio waves
is heavily dependent on the Euclidean distance between the sender and the receiver(s). Important challenges range
from very basic tasks, involved in building the infrastructure between the network nodes, to high level tasks, involved
in routing signals. An example of a low-level task is the assignment of frequencies or time slots such that no interference
between nearby stations occurs. An example of a high level task is routing messages from node A to node B using at
most k intermediate stations and minimizing the amount of energy required to transmit messages. Since wireless nodes in
a network are typically battery-powered devices, the efficient power management of these units becomes a major issue
when designing network topologies and protocols.
We are exploring a new approach to wireless networking by combining computational geometry with algorithm theory to
develop new and interesting protocols and network designs. For example, we recently created a new datastructure that,
for a given wireless network topology, answers k-hop queries for energy-efficient paths in constant time. We plan to
extend our approach to other communication networks, such as those involved in broadcasting.
Top
|
"Topological Methods for Vector Field Processing"
|
Research Mission
During the last decade, Scientific Visualization has grown into an active area of research focusing on a variety of different
applications. Among the data classes considered in visualization,flow data play an outstanding role. Flow data, obtained both from
simulation and measurement processes, usually comes as 2D or 3Dvector fields. Currently, a multitude of different visualization
techniques for flow data is available. Among them, topological methods have become a standard tool, because they promise to
visualize even complex flow structures by only a limited number of graphical primitives. After their introduction as visualization
tools by Helman/Hesslink, a considerable amount of research has been done in the field. The main idea of topological methods is to
segment the flow field into regions of different flow behavior.
This is done by extracting critical points and separatrices starting from the saddle points. These separatrices are certain
stream lines for 2D vector fields and stream surfaces in the 3D case. Although topological methods are well-established for flow
visualization, there is still a number of challenges and open problems to be solved. The research of our group focuses on two
main directions: the treatment of the topology of time-dependent flow fields, and the application of topological methods for
further problems, features and data classes.
For time-dependent vector fields, two kinds of characteristic curves exist: stream lines and path lines. Since topological
methods aim in the segmentation into areas of different flow behavior, two kinds of topologies can be distinguished for
time-dependent vector fields: a stream line oriented topology, and a path line oriented topology. For a stream line oriented
topology, topological feature of steady vector fields have to be tracked over time. Doing so, certain bifurcations may occur and
have to be extracted. While local bifurcations (like Hopf bifurcations and fold bifurcations) are well-known for
visualization purposes, our group is working on methods to extracting global bifurcations like saddle connections, closed
stream lines and cyclic fold bifurcations. Although most topological methods focus on a stream line oriented topology,
there is a demand for segmenting and understanding the behavior of path lines. The group is working on path line oriented approaches
for 2D and 3D time-dependent vector fields.
Topological methods can be used not only for visualization purposes but also in two other ways. First, they can also be used
to construct, compress, compare and simplify vector fields. Among them, our group works on topology based simplification techniques
for 3D vector fields. Second, topological concepts can be applied to other data classes (like volume data or tensor data) and other
features of vector fields. In particular, we are working on applying topological methods to extract, segment and classify
vortex core lines. Also, topological methods can be applied to vector fields which do not come from a flow simulation
environment. In particular, we are working on extracting characteristic 2D and 3D vector fields from surfaces and apply a
topological segmentation of them for shape and surface analysis.
Top
Previous Research Groups
|
"Distributed Media Systems"
|
Research Mission
Nowadays, several applications require the transmission of multimedia content to clients, possibly sparse all
over the world. While multimedia data delivery presents very stringent requirements in terms of bandwidth,
delay and jitter, today's networks often fail to provide the necessary "quality of service." Congestion and
network failures could cause severe degradation in the quality perceived by the final users. In particular,
the delivery of video content over an error-prone network poses some important challenges due to the predictive
structure of compressed signal. In this case, an error affecting one image usually is usually propagated over
several consequent pictures due to temporal prediction. The activity of this group is focused on the development
of novel techniques for the transmission of compressed video to a large number of users either over the Internet
or a wireless local area network.
Group communication from one source to many destinations is often involved in the transmission of video
over the Internet. In this scenario, the classical client-server architecture fails to scale with the number of
clients attached to the system, mostly because of the fixed amount of outgoing bandwidth supported by the server.
In this case, a peer-to-peer (P2P) network can be exploited to increase the performance. In fact, P2P networks
provide a "zero-cost" and extremely flexible infrastructure for the delivery of multimedia content, since the
amount of bandwidth available to each client allows the relay of the data to other peers in a distributed fashion.
Obviously, the dynamics of P2P networks, join and leave frequency and the heterogeneity of the connections
technologies still require a lot of research to develop stable solutions. Improvements may be introduced in
the communication protocol, designing algorithms that ensure the connectivity without an excessive increase
in terms of control overhead.
Another interesting scenario is the transmission over a wireless network. In this case, the end-to-end delay
can be low enough to allow to exploit network feedback. Information from lower levels in the network protocol
stack can be used to dynamically adapt working parameters for both the source and the network coder. For instance,
a network-aware rate control can be used to react to network congestion. These techniques are referred as
"Cross-Layer" design, since they exploit the exchange of information among different layers in the network
protocol stack. Finally, some novel solutions can also be introduced in the source coder to provide more "robust"
representations of the signal. Techniques such as Forward Error Correction or Multiple Description Coding
increase the resilience of the signal, providing a graceful degradation of the perceived video quality with
worsening channel conditions.
Top
|
"Learning-Based Modeling of Objects"
|
General Perspective
As technology in Computer Graphics becomes more and more powerful, a tremendous increase in the complexity of
rendered scenes and a high demand for realism has posed new challenges for modeling and rendering. It has become
essential to replace as much as possible of artists' and designers' manual work by automated algorithms, allowing
them to create scenes and objects on a higher, more abstract level.
The overall approach of the research group is to use learning-based methods in Computer Graphics in order to capture
the typical properties classes of objects, such as human faces. This involves three main steps that are addressed
by our projects: (1) data collection, (2) statistical data analysis, and (3) methods for application of the
class-specific information in Computer Graphics and Vision. The close relationship between Graphics and Vision
is reflected in our previous work: We combine methods from both fields, and our results can be used for face
modeling [3] and animation [1], but also for face recognition [4]. Building on the technology that we developed
in previous years, we plan to exploit new sources of data, such as time-sequences of 3D scans, and explore new
techniques for data analysis.
Our long-term vision for Computer Graphics is a technology that captures existing objects, scenes and events automatically,
and converts them into a mathematical representation that allows users to manipulate and interact with the scene on a
high level of abstraction. To achieve this, the measured data have to be converted into a representation that reflects
the mental representation in high-level stages of the human visual system. The user interface has to provide some of the
cognitive concepts that are meaningful to users, such as object identity, material properties, scene parameters and motion
patterns.
We have addressed this problem in previous work by separating the identity of a person from the scene parameters of an
image [4], and by manipulating meaningful attributes of faces such as gender or body weight, while keeping the persons'
identities unchanged [3]. On the way to implementing the long-term vision, a variety of interesting problems for
Computer Vision and Machine Learning can be defined, ranging from low-level preprocessing to object recognition.
Our approach to Computer Graphics is example based, unlike the state-of the art methods of manual design of objects,
material properties and motions applied in the production of movies, and unlike physical simulations presented in research.
Physical simulations of phenomena, such as mechanical deformations of faces during speech or the interaction of light
with matter, involve assumptions about the internal structure and the physical properties of objects. Simulations of
reasonably complex phenomena require a large number of parameters that are difficult to measure. Any simplifications
on this level are likely to produce unrealistic results.
Therefore, even though physical simulations are based on general physical laws, they cannot completely avoid empirical
measurements. In contrast, our inductive approach is entirely data-driven and fully automated. Measuring and modeling
only quantities that are directly perceivable, such as the deformation of faces or the radiance of reflected light,
our method directly maximizes the realism in reproducing the measurements and generalizing to new viewing conditions.
The statistical methods represent inherent physical laws and common properties of objects in an implicit way. For
example, the symmetry of human faces is captured by a high correlation between structures on the left and right
side of the faces in the database.
Statistical learning has become a very active field of research in the last decade, introducing important methods
such as Neural Networks and Support Vector Machines that learn general properties of data from examples by induction.
The general properties learned from data can, for example, be an estimate of a functional relationship (regression),
or the probability density of examples in an appropriate representation (parameter estimation).
We have addressed the latter problem in previous work in terms of a Linear Object Class [3]: Elements of a class of objects,
such as faces, cars or teeth, are converted into a vector space representation, and their probability density in this space
is estimated with a Principal Component Analysis. Within the linear span of examples and the region with a high estimated
probability, all vectors describe admissible elements of the object class. We have used regression to learn the difference
between male and female faces, and other attributes of faces from examples.
Finally, we have used the concept of Bayesian estimators for image analysis in model fitting and face recognition.
The powerful techniques of statistical learning provide a promising basis for future research on object representation,
image analysis and synthesis.
References
-
V. Blanz, C. Basso, T. Poggio, and T. Vetter. Reanimating faces in images and video. In P. Brunet and D. Fellner,
editors, Computer Graphics Forum, Vol. 22, No. 3 EUROGRAPHICS 2003, pages 641-650, Granada, Spain, 2003.
-
V. Blanz, B. Schölkopf, H. Bülthoff, C. Burges, V. Vapnik, and T. Vetter. Comparison of view-based object recognition
algorithms using realistic 3D models. In C. von der Malsburg, W. von Seelen, J.C. Vorbrüggen, and B. Sendhoff, editors,
Artificial Neural Networks - ICANN96, pages 251-256, Springer, Lecture Notes in Computer Science 1112, 1996.
-
V. Blanz and T. Vetter. A morphable model for the synthesis of 3D faces. In Computer Graphics Proc. SIGGRAPH'99,
pages 187-194, Los Angeles, 1999.
-
V. Blanz and T. Vetter. Face recognition based on fitting a 3d morphable model. IEEE Trans. on Pattern Analysis
and Machine Intell., 25(9):1063-1074, 2003.
Top
|
"Visual Sensor Networks"
|
Research Mission
Visual information plays a prominent role in our daily lives. This is not surprising as the human being is ocular-centric. We aim
to use our visual sense most efficiently and we make use of visual information to communicate our messages. Images and video have
changed the way we see the world. An image is capable of capturing the impression of a moment. Video adds another dimension
capturing the constant change of the expression of our world. But still, images and video let us perceive the world as
"one-dimensional". Being able of binocular vision, the human being benefits from more than one view of the world. Multi-view
imagery adds another dimension capable of capturing the constant change from various perspectives.
The research on Visual Sensor Networks investigates distributed visual communication with emphasis on both source coding and
transmission over networks. In particular, this research project considers visual communication of natural dynamic 3D scenes.
Spatially distributed video sensors capture a dynamic 3D scene from multiple view-points. The video sensors encode their signal
and transmit their data via the network to the central decoder which shall be able to reconstruct the dynamic 3D scene. The
sensor network shall exploit the correlation among the many observations of the scene. Also, communication among the visual
sensors shall enhance the efficiency of the sensor network. This project will address interesting problems like how to sample
the dynamic 3D scene efficiently as well as what messages have to be exchanged among the video sensors to maximize their efficiency.
Apart from communication tasks, Visual Sensor Networks may be helpful for other applications: As the views of the cameras overlap,
multi-view image sequence data may be used to track objects in 3D space or to estimate the motion field of voxels. Centralized
algorithms for these problems are known, but due to the large data volume generated by dense camera arrays, such algorithms
may not be feasible. To conclude, the signal that is desired at the fusion center of the Visual Sensor Network will also shape
its design. Efficient reconstruction for driving a holographic display with all camera signals will impose different constraints
than rendering a single novel view.
Top
General Perspective
Learning of Geometry: Given samples obtained from a shape we want to learn some of its geometric
and topological characteristics. A popular example that fits in this framework is surface reconstruction: to obtain
a digital model of some solid one samples its surface, e.g., by using a laser range scanner, and applies a
reconstruction algorithm to the sample that outputs a continuous model of the surface. In general shape sampling
is a very powerful paradigm that does not only allow to compute digital models of shapes but also to simplify these
models, to compute fingerprints for efficient retrieval in shape databases and to segment and partially match shapes.
The latter application is particularly interesting in biological applications where a macromolecule is modeled as
a union of balls that represent the molecule's atoms.
We are interested in data structures and algorithms for sample based geometry processing that provided some sampling
condition holds give results with provable guarantees. A major challenge is posed by kinetic data sets, i.e.,
samples of shapes that move or deform over time. Motion is fundamental in many disciplines that model the physical
world, such as robotics, computer graphics, or computational biology.
Geometry of Learning: Some powerful machine learning techniques like support vector machines and
spectral clustering are essentially geometric techniques. Support vector machines are mainly used in a supervised
setting, i.e., the data come with labels and a classifier is computed from the labeled data. The classifier should
generalize to data whose labels are not known in advance. Clustering is an example of unsupervised learning. Here the
goal is to attach labels to the given data, i.e., the labels are not part of the input as it is the case in the
supervised setting. Instead pairwise similarity/dissimilarity values are given. The goal is to assign labels such
that data with the same label are similar and dissimilar otherwise.
Geometry enters in support vector machines via the so called kernel trick that embeds the original data into some
Hilbert space. In Hilbert space the the data can be processed by exploiting its geometry, e.g., by the fact that we
can measure distances and angles.
In spectral clustering the original data are embedded into some Euclidean space. The embedding is derived from the
spectral properties of the pairwise similarity matrix, i.e., its eigenvalues and eigenspaces. Right now the embedding
is not really well understood but in many applications it allows standard geometric clustering algorithms like
the k-means algorithm to perform well on the embedded data.
We hope to get a better understanding of the success but also of the limitations of these techniques by looking
at the geometric and topological aspects of the corresponding embeddings.
Top
|
"3D Animation Processing"
|
General Perspective
A 3D animation represents the geometry of a moving or deformable 3D object such as an agile animal or a liquid in motion.
Previously 3D animations have mostly been synthesized automatically through physically based simulations and procedural
modeling or handmade by a skilled designer time-step by time-step.
In recent years significant advances have been made in all kinds of 3D acquisition technology: computer tomography, magnetic
resonance imaging, ultrasonic imaging and structured light or multi-stereo based 3D video scanning. This will demand for a
wide range of different processing tools for acquired 3D animation in the near future. The work of research group 3 is devoted
to the generalization of processing techniques for static geometry to the dynamic case. Theoretical and practical issues will
be investigated and the basic principles for the efficient representation and processing of 3D animations will be deduced.
An underlying technical problem of 3D animation processing is the tremendous size of the acquired or calculated raw data.
On the other hand forces the inertia inherent to our physical world a large amount of redundancy into dynamic geometry. Therefore
will be the most elementary processing task the elimination of this redundancy from the representation. In the first stage of
this project compact representations of the raw animation data will be developed that allow for the fast data-access necessary
for further processing.
Two further fundamental issues play an important role in 3D animation processing. The first one is the ability of a change in
topology, which arises whenever a deformable model splits apart or grows into itself. The geometrical description of the object
becomes singular at these critical points in time; a surface representation for example becomes non manifold. This demands for
more general data structures with flexible incremental update operations. The second issue concerns changes in the
parameterization. The efficiency of most processing algorithms builds on well behaved parameterizations, i.e. mappings to 2D with
low distortion or surface tessellations with nicely shaped elements. The surface of a deformable model can drastically shrink
or expand over time, such that also a well behaved parameterization must be dynamic. In the second stage of the project efficient
data structures for animated 3D geometry will be designed that allow for changed in topology and parameterization.
Equipped with efficient representation will the third stage of the project be devoted to the generalization of the most important
processing tasks for the dynamic case: reconstruction of dynamic surfaces from animated point clouds, segmentation of 3D image
time series, simplification and multi-resolution modeling in the space-time domain and continuative processing tasks like
animation repair, animation smoothing and animation editing.
Top
|
"Graphics - Optics - Vision" (Affiliated Independent Research Group)
|
Research Mission
My research interests focus on developing methods for realistic image generation. From its onset, the field of computer graphics
rendering has experienced a continuous increase not only in scientific recognition but also in economic relevance. Computer
graphics-based special effects and entirely computer-animated feature films have become established business segments of
the movie industry, and computer games constitute today a multi-billion dollar business. In recent years, however, progress
in rendering algorithms and graphics hardware has led to the insight that, despite faster and more complex rendering
calculations, the modeling techniques traditionally employed in computer graphics often fundamentally limit attainable
realism. I therefore concentrate on investigating suitable algorithms to import natural world-recorded realism into computer
graphics. Such image/video-based modeling and rendering techniques employ conventionally taken photographs or video recordings
of the real world to achieve photo-realistic rendering results.
Goal of my work is to combine the versatility and freedom of computer-based image synthesis methods with the natural
realism and ease of acquisition of a camcorder recording, making use of today's PC graphics card capabilities and consumer-market
imaging technology.
During my post-doctoral time as Research Associate at Stanford University, I participated in the Stanford Immersive Television
Project and worked on the acquisition, analysis and rendering of dynamic light fields. A real-time rendering algorithm exploiting
conventional PC graphics hardware capabilities enabled viewing the dynamic scene interactively. For optimal rendering results,
a novel algorithm was developed to robustly estimate dense depth maps from the MPEG-compressed video streams.
At the Max Planck Institute for Computer Science, I currently focus on developing efficient analysis and rendering methods
for sparse recording setups with only a handful of cameras. In our studio, we can record a stage area with eight synchronized
video cameras distributed all around the stage. In my group, we have developed methods for online visual hull reconstruction
and enhanced quality real-time rendering. Recently, we have succeeded in using a generic geometry model to automatically
analyze the complex movements of a human dancer, enabling photo-realistic, interactive rendering of the dancer from any
viewpoint.
Top
|
"Multimedia Retrieval"
|
Summary
Modern information society is experiencing an explosion of
digital content, comprising text, audio, video and graphics.
The challenge is to organize, understand, and search
multimodal information in a robust, efficient and intelligent manner.
One challenge arises from the fact that multimedia objects, even though
they are similar from a structural or semantic viewpoint, often reveal
significant spatial or temporal differences. This makes content-based
multimedia retrieval a challenging research field with many unsolved
problems.
In my habilitation project conducted at Bonn University,
we studied fundamental algorithms and concepts for the
analysis, classification, indexing, and retrieval of time-dependent
data streams by means of two different types
of multimedia data: waveform-based music data and human motion data.
In the music domain, we developed techniques for automatic music
alignment, synchronization, and matching. The common
goal of these tasks is to automatically link several
types of music representations, thus coordinating the multiple
information sources related to a given musical work.
In the motion domain, we introduced a general and unified
framework for motion analysis, retrieval, and classification
using binary features to represent poses.
By handling spatio-temporal motion deformations already on
the feature level, we were able to adopt efficient indexing
methods allowing for flexible and efficient content-based
retrieval for large motion capture data sets.
Top
|
"Marker-less Motion Capture"
|
Summary
Motion capturing (MoCap) comprises techniques for recording and analyzing human movements in image sequences.
In biomechanical settings, it is aimed at analyzing captured data to quantify the movement of body segments,
e.g. for clinical studies, or to help athletes to understand and improve their performance. It has also grown
increasingly important as source of motion data for computer animation. Well known and commercially available
marker-based tracking systems exist, e.g. those provided by Motion Analysis, Vicon or Simi. The use of markers
comes along with intrinsic problems, e.g. incorrect tracking of markers, tracking failures, the need for special
laboratory environments and lighting conditions and the fact that people do not feel comfortable with markers
attached to the body. This often leads to unnatural motion patterns. As well, marker-based systems are designed to
track the motion of the markers themselves, and thus it must be assumed that the recorded motion of the markers
is identical to the motion of the underlying human segments. Since human segments are not truly rigid this
assumption may cause problems, especially in highly dynamic movements typically seen in sporting activities.
For these reasons, marker-less tracking is an important field of research that requires knowledge in biomechanics,
computer vision and computer graphics.
The research group deals with different aspects regarding marker-less motion capture, e.g. pose estimation,
image segmentation, surface modeling, surface morphing and texture driven tracking. Foundations and experiences
of my
PhD-Thesis and
PostDoc in New Zealand are crucial
for our ongoing research.
Current
publications and the
ongoing
research is updated regularly.
Top
|
"Statistical Geometry Processing"
formerly: "High Level Editing and Semi-Automatic Modeling Tools for Highly Complex Scenes"
|
Research Goals and Directions
The major goal of computer graphics is the creation of photo-realistic synthetic models of reality. Synthetic depictions of
seemingly real entities are of utility in various different applications, ranging from electronic prototypes of engineering
parts to surreal virtual sets in fantasy movies. Today, computer graphics has already reached an impressive state-of-the-art:
For example, it is commonplace that substantial parts of major feature films consist of synthetic renderings.
However, the major bottleneck in today's content production pipeline is still the effort of the human designer: In order to
create believable, photo-realistic models, three-dimensional data sets of enormous complexity have to be built, which requires
much manual labor of highly-skilled experts. Thus, modeling costs (rather than costs of computational resources) is the main
barrier that often still prevents the application of computer graphics techniques.
The objective of this research group is the development of high-level modeling techniques that permit creation and editing of
complex three-dimensional models at a high level of abstraction. The goal is to provide tools to the modeler that operate closer
to the semantic domain than traditional low-level modeling techniques (which rather operate in the geometric domain). To be able
to support a more abstract approach to modeling, the software must to some limited extend "understand" the structure of the models.
The idea is to employ techniques from statistical data analysis and machine learning to "learn" the structure of aspects of example
models in order to instantiate them again later during editing.
This process is performed in three steps: First, a formal statistical model of the aspect to be analyzed has to be set up. This
could describe e.g. the correlation of local geometric features or describe a low-dimensional parameterization of the space of
overall shape. Next, example data (e.g. previous models that does not fully met the demands of the modeler, or a data set from a
3d-scanner) is analyzed. The analysis retrieves an estimate of the probability distribution of the statistical model parameters.
This knowledge then facilitates a re-instantiation of new models (or parts of models) with a likely (i.e. believable) structure.
By applying this analysis to different aspects of the model (ranging from local geometric texture to overall shape), different tools
can be devised that support editing of various aspects of the model.
The primary goal of the described research is to create more productive modeling tools for editing scenes with a photo-realistic
amount of details. A good high-level modeling tool must provide an algorithmic formalism for "understanding" model aspects.
Therefore, as a side effect, the resulting statistical models could possibly also reveal some insight into the structure of
real-world artifacts and their human perception.
Top