On 13 January 2026 Mallikarjun BR successfully defended his thesis with the title: "Monocular face reconstruction and editing using priors learned from 2D data”. He was a doctoral candidate in the MPI for Informatics and Saarland University. The thesis was supervised by Prof. Dr. Christian Theobalt, Scientific Director of the Department Visual Computing and Artificial Intelligence. The doctoral degree was awarded by Saarland University.
Abstract of the thesis:
Digital facial models equipped with semantic editing capabilities play a pivotal role across various domains such as film, gaming, telepresence, and social media. Conventionally, digital modeling involved representing both geometric and appearance properties, with the ability to semantically edit expressions and appearances in response to scene illumination changes and facial part alterations. Traditionally, achieving this level of fidelity necessitated costly setups like multi-view and light-stage rigs, limiting accessibility due to physical and financial constraints.
Consequently, methods that require just asingle monocular image offer substantial practical advantages, albeit facing the challenge of being under-constrained. To address this challenge, methods often rely on prior models, such as 3D Morphable Models (3DMM), constructed from a collection of 3D scans. However, acquiring large-scale 3D scans poses its own set of challenges, thereby limiting the quality of the prior model based on available data.
In this thesis, a novel approach is proposed to learn a 3DMM model directly from extensive unstructured video and image datasets. Furthermore, existing methods typically approximate skin as a diffuse surface, failing to accurately capture photo-realistic appearance, particularly under complex illumination conditions involving diffuse, specular, subsurface scattering, self-shadows, and inter-reflections. To address this limitation, a new neural representation is proposed to estimate intricate illumination effects. Additionally, while modeling facial appearance, it’s crucial to account for non-facial regions like hair and neck. This thesis introduces a method leveraging a pre-trained 2D Generative Adversarial Network (GAN) to synthesize novel views and illumination, ensuring comprehensive modeling of these regions. Facial structures encompass various semantic parts like hair, eyes, and eyebrows. Existing methods often overlook certain parts or use a unified representation, hindering specific part-editing tasks. To overcome this, a compositional generative model is proposed, treating each part as a distinct entity. Efficient and photorealistic models are essential for wide-spread adoption. Thus, this thesis proposes an efficient 3D generative model capable of real-time sampling and rendering. Moreover, this model offers dense 3D correspondences between samples, enhancing its utility for downstream applications. Lastly, the thesis provides an outlook on future research directions for each sub-problem addressed herein.
