D6
Visual Computing and Artificial Intelligence

Publications

The Visual Computing and Artificial Intelligence department investigates challenging research question at the intersection of Computer Graphics, Computer Vision and Machine Learning. Before becoming the director of this department, Christian Theobalt was the head of the research Graphics, Vision & Video research group. Please follow the link here to the research areas of the former group, which will be continued and expanded in the new department. More information will follow soon.

Publications

2021
Ali, S.A., Kahraman, K., Theobalt, C., Stricker, D., and Golyanik, V. 2021. Fast Gravitational Approach for Rigid Point Set Registration with Ordinary Differential Equations. IEEE Access9.
Export
BibTeX
@article{Ali2021, TITLE = {Fast Gravitational Approach for Rigid Point Set Registration with Ordinary Differential Equations}, AUTHOR = {Ali, Sk Aziz and Kahraman, Kerem and Theobalt, Christian and Stricker, Didier and Golyanik, Vladislav}, LANGUAGE = {eng}, ISSN = {2169-3536}, DOI = {10.1109/ACCESS.2021.3084505}, PUBLISHER = {IEEE}, ADDRESS = {Piscataway, NJ}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, JOURNAL = {IEEE Access}, VOLUME = {9}, PAGES = {79060--79079}, }
Endnote
%0 Journal Article %A Ali, Sk Aziz %A Kahraman, Kerem %A Theobalt, Christian %A Stricker, Didier %A Golyanik, Vladislav %+ External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Fast Gravitational Approach for Rigid Point Set Registration with Ordinary Differential Equations : %G eng %U http://hdl.handle.net/21.11116/0000-0008-F07E-C %R 10.1109/ACCESS.2021.3084505 %7 2021 %D 2021 %J IEEE Access %V 9 %& 79060 %P 79060 - 79079 %I IEEE %C Piscataway, NJ %@ false
Birdal, T., Golyanik, V., Theobalt, C., and Guibas, L. 2021. Quantum Permutation Synchronization. https://arxiv.org/abs/2101.07755.
(arXiv: 2101.07755)
Abstract
We present QuantumSync, the first quantum algorithm for solving a synchronization problem in the context of computer vision. In particular, we focus on permutation synchronization which involves solving a non-convex optimization problem in discrete variables. We start by formulating synchronization into a quadratic unconstrained binary optimization problem (QUBO). While such formulation respects the binary nature of the problem, ensuring that the result is a set of permutations requires extra care. Hence, we: (i) show how to insert permutation constraints into a QUBO problem and (ii) solve the constrained QUBO problem on the current generation of the adiabatic quantum computers D-Wave. Thanks to the quantum annealing, we guarantee global optimality with high probability while sampling the energy landscape to yield confidence estimates. Our proof-of-concepts realization on the adiabatic D-Wave computer demonstrates that quantum machines offer a promising way to solve the prevalent yet difficult synchronization problems.
Export
BibTeX
@online{Birdal_2101.07755, TITLE = {Quantum Permutation Synchronization}, AUTHOR = {Birdal, Tolga and Golyanik, Vladislav and Theobalt, Christian and Guibas, Leonidas}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2101.07755}, EPRINT = {2101.07755}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We present QuantumSync, the first quantum algorithm for solving a synchronization problem in the context of computer vision. In particular, we focus on permutation synchronization which involves solving a non-convex optimization problem in discrete variables. We start by formulating synchronization into a quadratic unconstrained binary optimization problem (QUBO). While such formulation respects the binary nature of the problem, ensuring that the result is a set of permutations requires extra care. Hence, we: (i) show how to insert permutation constraints into a QUBO problem and (ii) solve the constrained QUBO problem on the current generation of the adiabatic quantum computers D-Wave. Thanks to the quantum annealing, we guarantee global optimality with high probability while sampling the energy landscape to yield confidence estimates. Our proof-of-concepts realization on the adiabatic D-Wave computer demonstrates that quantum machines offer a promising way to solve the prevalent yet difficult synchronization problems.}, }
Endnote
%0 Report %A Birdal, Tolga %A Golyanik, Vladislav %A Theobalt, Christian %A Guibas, Leonidas %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations %T Quantum Permutation Synchronization : %G eng %U http://hdl.handle.net/21.11116/0000-0007-E895-B %U https://arxiv.org/abs/2101.07755 %D 2021 %X We present QuantumSync, the first quantum algorithm for solving a synchronization problem in the context of computer vision. In particular, we focus on permutation synchronization which involves solving a non-convex optimization problem in discrete variables. We start by formulating synchronization into a quadratic unconstrained binary optimization problem (QUBO). While such formulation respects the binary nature of the problem, ensuring that the result is a set of permutations requires extra care. Hence, we: (i) show how to insert permutation constraints into a QUBO problem and (ii) solve the constrained QUBO problem on the current generation of the adiabatic quantum computers D-Wave. Thanks to the quantum annealing, we guarantee global optimality with high probability while sampling the energy landscape to yield confidence estimates. Our proof-of-concepts realization on the adiabatic D-Wave computer demonstrates that quantum machines offer a promising way to solve the prevalent yet difficult synchronization problems. %K Quantum Physics, quant-ph,Computer Science, Computer Vision and Pattern Recognition, cs.CV,cs.ET,Computer Science, Learning, cs.LG,Computer Science, Robotics, cs.RO
Birdal, T., Golyanik, V., Theobalt, C., and Guibas, L. Quantum Permutation Synchronization. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{Birdal_CVPR2021b, TITLE = {Quantum Permutation Synchronization}, AUTHOR = {Birdal, Tolga and Golyanik, Vladislav and Theobalt, Christian and Guibas, Leonidas}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021)}, PAGES = {13122--13133}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Birdal, Tolga %A Golyanik, Vladislav %A Theobalt, Christian %A Guibas, Leonidas %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations %T Quantum Permutation Synchronization : %G eng %U http://hdl.handle.net/21.11116/0000-0008-8933-4 %D 2021 %B 34th IEEE Conference on Computer Vision and Pattern Recognition %Z date of event: 2021-06-19 - 2021-06-25 %C Virtual Conference %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %P 13122 - 13133 %I IEEE %U https://gvv.mpi-inf.mpg.de/projects/QUANTUMSYNC/
Chu, M., Thuerey, N., Seidel, H.-P., Theobalt, C., and Zayer, R. 2021. Learning Meaningful Controls for Fluids. ACM Transactions on Graphics (Proc. ACM SIGGRAPH 2021)40, 4.
Export
BibTeX
@article{Chu2021, TITLE = {Learning Meaningful Controls for Fluids}, AUTHOR = {Chu, Mengyu and Thuerey, Nils and Seidel, Hans-Peter and Theobalt, Christian and Zayer, Rhaleb}, LANGUAGE = {eng}, ISSN = {0730-0301}, DOI = {10.1145/3450626.3459845}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM Transactions on Graphics (Proc. ACM SIGGRAPH)}, VOLUME = {40}, NUMBER = {4}, PAGES = {1--13}, EID = {100}, BOOKTITLE = {Proceedings of ACM SIGGRAPH 2021}, }
Endnote
%0 Journal Article %A Chu, Mengyu %A Thuerey, Nils %A Seidel, Hans-Peter %A Theobalt, Christian %A Zayer, Rhaleb %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Computer Graphics, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society %T Learning Meaningful Controls for Fluids : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4B91-F %R 10.1145/3450626.3459845 %7 2021 %D 2021 %J ACM Transactions on Graphics %V 40 %N 4 %& 1 %P 1 - 13 %Z sequence number: 100 %I ACM %C New York, NY %@ false %B Proceedings of ACM SIGGRAPH 2021 %O ACM SIGGRAPH 2021
Dabral, R., Shimada, S., Jain, A., Theobalt, C., and Golyanik, V. 2021. Gravity-Aware Monocular 3D Human-Object Reconstruction. https://arxiv.org/abs/2108.08844.
(arXiv: 2108.08844)
Abstract
This paper proposes GraviCap, i.e., a new approach for joint markerless 3D human motion capture and object trajectory estimation from monocular RGB videos. We focus on scenes with objects partially observed during a free flight. In contrast to existing monocular methods, we can recover scale, object trajectories as well as human bone lengths in meters and the ground plane's orientation, thanks to the awareness of the gravity constraining object motions. Our objective function is parametrised by the object's initial velocity and position, gravity direction and focal length, and jointly optimised for one or several free flight episodes. The proposed human-object interaction constraints ensure geometric consistency of the 3D reconstructions and improved physical plausibility of human poses compared to the unconstrained case. We evaluate GraviCap on a new dataset with ground-truth annotations for persons and different objects undergoing free flights. In the experiments, our approach achieves state-of-the-art accuracy in 3D human motion capture on various metrics. We urge the reader to watch our supplementary video. Both the source code and the dataset are released; see http://4dqv.mpi-inf.mpg.de/GraviCap/.
Export
BibTeX
@online{Dabral_arXiv2108.08844, TITLE = {Gravity-Aware Monocular {3D} Human-Object Reconstruction}, AUTHOR = {Dabral, Rishabh and Shimada, Soshi and Jain, Arjun and Theobalt, Christian and Golyanik, Vladislav}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2108.08844}, EPRINT = {2108.08844}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {This paper proposes GraviCap, i.e., a new approach for joint markerless 3D human motion capture and object trajectory estimation from monocular RGB videos. We focus on scenes with objects partially observed during a free flight. In contrast to existing monocular methods, we can recover scale, object trajectories as well as human bone lengths in meters and the ground plane's orientation, thanks to the awareness of the gravity constraining object motions. Our objective function is parametrised by the object's initial velocity and position, gravity direction and focal length, and jointly optimised for one or several free flight episodes. The proposed human-object interaction constraints ensure geometric consistency of the 3D reconstructions and improved physical plausibility of human poses compared to the unconstrained case. We evaluate GraviCap on a new dataset with ground-truth annotations for persons and different objects undergoing free flights. In the experiments, our approach achieves state-of-the-art accuracy in 3D human motion capture on various metrics. We urge the reader to watch our supplementary video. Both the source code and the dataset are released; see http://4dqv.mpi-inf.mpg.de/GraviCap/.}, }
Endnote
%0 Report %A Dabral, Rishabh %A Shimada, Soshi %A Jain, Arjun %A Theobalt, Christian %A Golyanik, Vladislav %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Gravity-Aware Monocular 3D Human-Object Reconstruction : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4D2C-1 %U https://arxiv.org/abs/2108.08844 %D 2021 %X This paper proposes GraviCap, i.e., a new approach for joint markerless 3D human motion capture and object trajectory estimation from monocular RGB videos. We focus on scenes with objects partially observed during a free flight. In contrast to existing monocular methods, we can recover scale, object trajectories as well as human bone lengths in meters and the ground plane's orientation, thanks to the awareness of the gravity constraining object motions. Our objective function is parametrised by the object's initial velocity and position, gravity direction and focal length, and jointly optimised for one or several free flight episodes. The proposed human-object interaction constraints ensure geometric consistency of the 3D reconstructions and improved physical plausibility of human poses compared to the unconstrained case. We evaluate GraviCap on a new dataset with ground-truth annotations for persons and different objects undergoing free flights. In the experiments, our approach achieves state-of-the-art accuracy in 3D human motion capture on various metrics. We urge the reader to watch our supplementary video. Both the source code and the dataset are released; see http://4dqv.mpi-inf.mpg.de/GraviCap/. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV %U http://4dqv.mpi-inf.mpg.de/GraviCap/
Dib, A., Thebault, C., Ahn, J., Gosselin, P.-H., Theobalt, C., and Chevallier, L. 2021. Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing. https://arxiv.org/abs/2103.15432.
(arXiv: 2103.15432)
Abstract
Robust face reconstruction from monocular image in general lighting conditions is challenging. Methods combining deep neural network encoders with differentiable rendering have opened up the path for very fast monocular reconstruction of geometry, lighting and reflectance. They can also be trained in self-supervised manner for increased robustness and better generalization. However, their differentiable rasterization based image formation models, as well as underlying scene parameterization, limit them to Lambertian face reflectance and to poor shape details. More recently, ray tracing was introduced for monocular face reconstruction within a classic optimization-based framework and enables state-of-the art results. However optimization-based approaches are inherently slow and lack robustness. In this paper, we build our work on the aforementioned approaches and propose a new method that greatly improves reconstruction quality and robustness in general scenes. We achieve this by combining a CNN encoder with a differentiable ray tracer, which enables us to base the reconstruction on much more advanced personalized diffuse and specular albedos, a more sophisticated illumination model and a plausible representation of self-shadows. This enables to take a big leap forward in reconstruction quality of shape, appearance and lighting even in scenes with difficult illumination. With consistent face attributes reconstruction, our method leads to practical applications such as relighting and self-shadows removal. Compared to state-of-the-art methods, our results show improved accuracy and validity of the approach.
Export
BibTeX
@online{Dib_arXiv2103.15432, TITLE = {Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing}, AUTHOR = {Dib, Abdallah and Thebault, Cedric and Ahn, Junghyun and Gosselin, Philippe-Henri and Theobalt, Christian and Chevallier, Louis}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2103.15432}, EPRINT = {2103.15432}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Robust face reconstruction from monocular image in general lighting conditions is challenging. Methods combining deep neural network encoders with differentiable rendering have opened up the path for very fast monocular reconstruction of geometry, lighting and reflectance. They can also be trained in self-supervised manner for increased robustness and better generalization. However, their differentiable rasterization based image formation models, as well as underlying scene parameterization, limit them to Lambertian face reflectance and to poor shape details. More recently, ray tracing was introduced for monocular face reconstruction within a classic optimization-based framework and enables state-of-the art results. However optimization-based approaches are inherently slow and lack robustness. In this paper, we build our work on the aforementioned approaches and propose a new method that greatly improves reconstruction quality and robustness in general scenes. We achieve this by combining a CNN encoder with a differentiable ray tracer, which enables us to base the reconstruction on much more advanced personalized diffuse and specular albedos, a more sophisticated illumination model and a plausible representation of self-shadows. This enables to take a big leap forward in reconstruction quality of shape, appearance and lighting even in scenes with difficult illumination. With consistent face attributes reconstruction, our method leads to practical applications such as relighting and self-shadows removal. Compared to state-of-the-art methods, our results show improved accuracy and validity of the approach.}, }
Endnote
%0 Report %A Dib, Abdallah %A Thebault, Cedric %A Ahn, Junghyun %A Gosselin, Philippe-Henri %A Theobalt, Christian %A Chevallier, Louis %+ External Organizations External Organizations External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations %T Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing : %G eng %U http://hdl.handle.net/21.11116/0000-0009-5339-A %U https://arxiv.org/abs/2103.15432 %D 2021 %X Robust face reconstruction from monocular image in general lighting conditions is challenging. Methods combining deep neural network encoders with differentiable rendering have opened up the path for very fast monocular reconstruction of geometry, lighting and reflectance. They can also be trained in self-supervised manner for increased robustness and better generalization. However, their differentiable rasterization based image formation models, as well as underlying scene parameterization, limit them to Lambertian face reflectance and to poor shape details. More recently, ray tracing was introduced for monocular face reconstruction within a classic optimization-based framework and enables state-of-the art results. However optimization-based approaches are inherently slow and lack robustness. In this paper, we build our work on the aforementioned approaches and propose a new method that greatly improves reconstruction quality and robustness in general scenes. We achieve this by combining a CNN encoder with a differentiable ray tracer, which enables us to base the reconstruction on much more advanced personalized diffuse and specular albedos, a more sophisticated illumination model and a plausible representation of self-shadows. This enables to take a big leap forward in reconstruction quality of shape, appearance and lighting even in scenes with difficult illumination. With consistent face attributes reconstruction, our method leads to practical applications such as relighting and self-shadows removal. Compared to state-of-the-art methods, our results show improved accuracy and validity of the approach. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Learning, cs.LG
Fox, G., Liu, W., Kim, H., Seidel, H.-P., Elgharib, M., and Theobalt, C. 2021a. VideoForensicsHQ: Detecting High-quality Manipulated Face Videos. IEEE International Conference on Multimedia and Expo (ICME 2021), IEEE.
Export
BibTeX
@inproceedings{Fox_ICME2021, TITLE = {{Video\-Foren\-sics\-HQ}: {D}etecting High-quality Manipulated Face Videos}, AUTHOR = {Fox, Gereon and Liu, Wentao and Kim, Hyeongwoo and Seidel, Hans-Peter and Elgharib, Mohamed and Theobalt, Christian}, LANGUAGE = {eng}, ISBN = {978-1-6654-3864-3}, DOI = {10.1109/ICME51207.2021.9428101}, PUBLISHER = {IEEE}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE International Conference on Multimedia and Expo (ICME 2021)}, ADDRESS = {Shenzhen, China (Virtual)}, }
Endnote
%0 Conference Proceedings %A Fox, Gereon %A Liu, Wentao %A Kim, Hyeongwoo %A Seidel, Hans-Peter %A Elgharib, Mohamed %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T VideoForensicsHQ: Detecting High-quality Manipulated Face Videos : %G eng %U http://hdl.handle.net/21.11116/0000-0008-88DF-4 %R 10.1109/ICME51207.2021.9428101 %D 2021 %B 22nd IEEE International Conference on Multimedia and Expo %Z date of event: 2021-07-05 - 2021-07-07 %C Shenzhen, China (Virtual) %B IEEE International Conference on Multimedia and Expo %I IEEE %@ 978-1-6654-3864-3 %U http://gvv.mpi-inf.mpg.de/projects/VForensicsHQ/
Fox, G., Tewari, A., Elgharib, M., and Theobalt, C. 2021b. StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN. https://arxiv.org/abs/2107.07224.
(arXiv: 2107.07224)
Abstract
Generative adversarial models (GANs) continue to produce advances in terms of the visual quality of still images, as well as the learning of temporal correlations. However, few works manage to combine these two interesting capabilities for the synthesis of video content: Most methods require an extensive training dataset in order to learn temporal correlations, while being rather limited in the resolution and visual quality of their output frames. In this paper, we present a novel approach to the video synthesis problem that helps to greatly improve visual quality and drastically reduce the amount of training data and resources necessary for generating video content. Our formulation separates the spatial domain, in which individual frames are synthesized, from the temporal domain, in which motion is generated. For the spatial domain we make use of a pre-trained StyleGAN network, the latent space of which allows control over the appearance of the objects it was trained for. The expressive power of this model allows us to embed our training videos in the StyleGAN latent space. Our temporal architecture is then trained not on sequences of RGB frames, but on sequences of StyleGAN latent codes. The advantageous properties of the StyleGAN space simplify the discovery of temporal correlations. We demonstrate that it suffices to train our temporal architecture on only 10 minutes of footage of 1 subject for about 6 hours. After training, our model can not only generate new portrait videos for the training subject, but also for any random subject which can be embedded in the StyleGAN space.
Export
BibTeX
@online{Fox_arXiv2107.07224, TITLE = {{StyleVideoGAN}: A Temporal Generative Model using a Pretrained {StyleGAN}}, AUTHOR = {Fox, Gereon and Tewari, Ayush and Elgharib, Mohamed and Theobalt, Christian}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2107.07224}, EPRINT = {2107.07224}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Generative adversarial models (GANs) continue to produce advances in terms of the visual quality of still images, as well as the learning of temporal correlations. However, few works manage to combine these two interesting capabilities for the synthesis of video content: Most methods require an extensive training dataset in order to learn temporal correlations, while being rather limited in the resolution and visual quality of their output frames. In this paper, we present a novel approach to the video synthesis problem that helps to greatly improve visual quality and drastically reduce the amount of training data and resources necessary for generating video content. Our formulation separates the spatial domain, in which individual frames are synthesized, from the temporal domain, in which motion is generated. For the spatial domain we make use of a pre-trained StyleGAN network, the latent space of which allows control over the appearance of the objects it was trained for. The expressive power of this model allows us to embed our training videos in the StyleGAN latent space. Our temporal architecture is then trained not on sequences of RGB frames, but on sequences of StyleGAN latent codes. The advantageous properties of the StyleGAN space simplify the discovery of temporal correlations. We demonstrate that it suffices to train our temporal architecture on only 10 minutes of footage of 1 subject for about 6 hours. After training, our model can not only generate new portrait videos for the training subject, but also for any random subject which can be embedded in the StyleGAN space.}, }
Endnote
%0 Report %A Fox, Gereon %A Tewari, Ayush %A Elgharib, Mohamed %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4D76-D %U https://arxiv.org/abs/2107.07224 %D 2021 %X Generative adversarial models (GANs) continue to produce advances in terms of the visual quality of still images, as well as the learning of temporal correlations. However, few works manage to combine these two interesting capabilities for the synthesis of video content: Most methods require an extensive training dataset in order to learn temporal correlations, while being rather limited in the resolution and visual quality of their output frames. In this paper, we present a novel approach to the video synthesis problem that helps to greatly improve visual quality and drastically reduce the amount of training data and resources necessary for generating video content. Our formulation separates the spatial domain, in which individual frames are synthesized, from the temporal domain, in which motion is generated. For the spatial domain we make use of a pre-trained StyleGAN network, the latent space of which allows control over the appearance of the objects it was trained for. The expressive power of this model allows us to embed our training videos in the StyleGAN latent space. Our temporal architecture is then trained not on sequences of RGB frames, but on sequences of StyleGAN latent codes. The advantageous properties of the StyleGAN space simplify the discovery of temporal correlations. We demonstrate that it suffices to train our temporal architecture on only 10 minutes of footage of 1 subject for about 6 hours. After training, our model can not only generate new portrait videos for the training subject, but also for any random subject which can be embedded in the StyleGAN space. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Ghosh, A., Cheema, N., Oguz, C., Theobalt, C., and Slusallek, P. 2021. Text-Based Motion Synthesis with a Hierarchical Two-Stream RNN. ACM SIGGRAPH 2021 Posters.
Export
BibTeX
@inproceedings{Ghosh_SIGGRAPH21Poster, TITLE = {Text-Based Motion Synthesis with a Hierarchical Two-Stream {RNN}}, AUTHOR = {Ghosh, Anindita and Cheema, Noshaba and Oguz, Cennet and Theobalt, Christian and Slusallek, Philipp}, LANGUAGE = {eng}, ISBN = {9781450383714}, DOI = {10.1145/3450618.3469163}, PUBLISHER = {ACM}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ACM SIGGRAPH 2021 Posters}, PAGES = {1--2}, EID = {42}, ADDRESS = {Virtual Event, USA}, }
Endnote
%0 Generic %A Ghosh, Anindita %A Cheema, Noshaba %A Oguz, Cennet %A Theobalt, Christian %A Slusallek, Philipp %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations %T Text-Based Motion Synthesis with a Hierarchical Two-Stream RNN : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4D25-8 %R 10.1145/3450618.3469163 %D 2021 %Z name of event: ACM SIGGRAPH 2021 %Z date of event: 2021-08-09 - 2021-08-13 %Z place of event: Virtual Event, USA %B ACM SIGGRAPH 2021 Posters %P 1 - 2 %Z sequence number: 42 %@ 9781450383714
Ghosh, A., Cheema, N., Oguz, C., Theobalt, C., and Slusallek, P. Synthesis of Compositional Animations from Textual Descriptions. International Conference on Computer Vision, IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{Ghosh_ICCV2021, TITLE = {Synthesis of Compositional Animations from Textual Descriptions}, AUTHOR = {Ghosh, Anindita and Cheema, Noshaba and Oguz, Cennet and Theobalt, Christian and Slusallek, Philipp}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {International Conference on Computer Vision}, ADDRESS = {Virtual}, }
Endnote
%0 Conference Proceedings %A Ghosh, Anindita %A Cheema, Noshaba %A Oguz, Cennet %A Theobalt, Christian %A Slusallek, Philipp %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations %T Synthesis of Compositional Animations from Textual Descriptions : %G eng %U http://hdl.handle.net/21.11116/0000-0009-5345-C %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual %B International Conference on Computer Vision %I IEEE
Habermann, M., Liu, L., Xu, W., Zollhöfer, M., Pons-Moll, G., and Theobalt, C. 2021. Real-time Deep Dynamic Characters. ACM Transactions on Graphics (Proc. ACM SIGGRAPH 2021)40, 4.
Export
BibTeX
@article{Habermann2021, TITLE = {Real-time Deep Dynamic Characters}, AUTHOR = {Habermann, Marc and Liu, Lingjie and Xu, Weipeng and Zollh{\"o}fer, Michael and Pons-Moll, Gerard and Theobalt, Christian}, LANGUAGE = {eng}, ISSN = {0730-0301}, DOI = {10.1145/3450626.3459749}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM Transactions on Graphics (Proc. ACM SIGGRAPH)}, VOLUME = {40}, NUMBER = {4}, PAGES = {1--16}, EID = {94}, BOOKTITLE = {Proceedings of ACM SIGGRAPH 2021}, }
Endnote
%0 Journal Article %A Habermann, Marc %A Liu, Lingjie %A Xu, Weipeng %A Zollhöfer, Michael %A Pons-Moll, Gerard %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Real-time Deep Dynamic Characters : %G eng %U http://hdl.handle.net/21.11116/0000-0009-2A93-2 %R 10.1145/3450626.3459749 %7 2021 %D 2021 %J ACM Transactions on Graphics %V 40 %N 4 %& 1 %P 1 - 16 %Z sequence number: 94 %I ACM %C New York, NY %@ false %B Proceedings of ACM SIGGRAPH 2021 %O ACM SIGGRAPH 2021
Habibie, I., Xu, W., Mehta, D., et al. 2021a. Learning Speech-driven 3D Conversational Gestures from Video. Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents (IVA 2021), ACM.
Export
BibTeX
@inproceedings{Habibie_IVA2021, TITLE = {Learning Speech-driven {3D} Conversational Gestures from Video}, AUTHOR = {Habibie, Ikhsanul and Xu, Weipeng and Mehta, Dushyant and Liu, Lingjie and Seidel, Hans-Peter and Pons-Moll, Gerard and Elgharib, Mohamed and Theobalt, Christian}, LANGUAGE = {eng}, ISBN = {9781450386197}, DOI = {10.1145/3472306.3478335}, PUBLISHER = {ACM}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents (IVA 2021)}, PAGES = {101--108}, ADDRESS = {Virtual Event, Japan}, }
Endnote
%0 Conference Proceedings %A Habibie, Ikhsanul %A Xu, Weipeng %A Mehta, Dushyant %A Liu, Lingjie %A Seidel, Hans-Peter %A Pons-Moll, Gerard %A Elgharib, Mohamed %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Computer Graphics, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Learning Speech-driven 3D Conversational Gestures from Video : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4D19-6 %R 10.1145/3472306.3478335 %D 2021 %B 21st ACM International Conference on Intelligent Virtual Agents %Z date of event: 2021-09-14 - 2021-09-17 %C Virtual Event, Japan %B Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents %P 101 - 108 %I ACM %@ 9781450386197
Habibie, I., Xu, W., Mehta, D., et al. 2021b. Learning Speech-driven 3D Conversational Gestures from Video. https://arxiv.org/abs/2102.06837.
(arXiv: 2102.06837)
Abstract
We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from speech input. Our algorithm uses a CNN architecture that leverages the inherent correlation between facial expression and hand gestures. Synthesis of conversational body gestures is a multi-modal problem since many similar gestures can plausibly accompany the same input speech. To synthesize plausible body gestures in this setting, we train a Generative Adversarial Network (GAN) based model that measures the plausibility of the generated sequences of 3D body motion when paired with the input audio features. We also contribute a new way to create a large corpus of more than 33 hours of annotated body, hand, and face data from in-the-wild videos of talking people. To this end, we apply state-of-the-art monocular approaches for 3D body and hand pose estimation as well as dense 3D face performance capture to the video corpus. In this way, we can train on orders of magnitude more data than previous algorithms that resort to complex in-studio motion capture solutions, and thereby train more expressive synthesis algorithms. Our experiments and user study show the state-of-the-art quality of our speech-synthesized full 3D character animations.
Export
BibTeX
@online{Habibie_2102.06837, TITLE = {Learning Speech-driven {3D} Conversational Gestures from Video}, AUTHOR = {Habibie, Ikhsanul and Xu, Weipeng and Mehta, Dushyant and Liu, Lingjie and Seidel, Hans-Peter and Pons-Moll, Gerard and Elgharib, Mohamed and Theobalt, Christian}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2102.06837}, EPRINT = {2102.06837}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from speech input. Our algorithm uses a CNN architecture that leverages the inherent correlation between facial expression and hand gestures. Synthesis of conversational body gestures is a multi-modal problem since many similar gestures can plausibly accompany the same input speech. To synthesize plausible body gestures in this setting, we train a Generative Adversarial Network (GAN) based model that measures the plausibility of the generated sequences of 3D body motion when paired with the input audio features. We also contribute a new way to create a large corpus of more than 33 hours of annotated body, hand, and face data from in-the-wild videos of talking people. To this end, we apply state-of-the-art monocular approaches for 3D body and hand pose estimation as well as dense 3D face performance capture to the video corpus. In this way, we can train on orders of magnitude more data than previous algorithms that resort to complex in-studio motion capture solutions, and thereby train more expressive synthesis algorithms. Our experiments and user study show the state-of-the-art quality of our speech-synthesized full 3D character animations.}, }
Endnote
%0 Report %A Habibie, Ikhsanul %A Xu, Weipeng %A Mehta, Dushyant %A Liu, Lingjie %A Seidel, Hans-Peter %A Pons-Moll, Gerard %A Elgharib, Mohamed %A Theobalt, Christian %+ Computer Graphics, MPI for Informatics, Max Planck Society External Organizations Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Learning Speech-driven 3D Conversational Gestures from Video : %G eng %U http://hdl.handle.net/21.11116/0000-0009-70C7-8 %U https://arxiv.org/abs/2102.06837 %D 2021 %X We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from speech input. Our algorithm uses a CNN architecture that leverages the inherent correlation between facial expression and hand gestures. Synthesis of conversational body gestures is a multi-modal problem since many similar gestures can plausibly accompany the same input speech. To synthesize plausible body gestures in this setting, we train a Generative Adversarial Network (GAN) based model that measures the plausibility of the generated sequences of 3D body motion when paired with the input audio features. We also contribute a new way to create a large corpus of more than 33 hours of annotated body, hand, and face data from in-the-wild videos of talking people. To this end, we apply state-of-the-art monocular approaches for 3D body and hand pose estimation as well as dense 3D face performance capture to the video corpus. In this way, we can train on orders of magnitude more data than previous algorithms that resort to complex in-studio motion capture solutions, and thereby train more expressive synthesis algorithms. Our experiments and user study show the state-of-the-art quality of our speech-synthesized full 3D character animations. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Kappel, M., Golyanik, V., Elgharib, M., et al. 2021. High-Fidelity Neural Human Motion Transfer from Monocular Video Computer Vision and Pattern Recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral) (CVPR 2021), IEEE.
Export
BibTeX
@inproceedings{Kappel_CVPR2021, TITLE = {High-Fidelity Neural Human Motion Transfer from Monocular Video Computer Vision and Pattern Recognition}, AUTHOR = {Kappel, Moritz and Golyanik, Vladislav and Elgharib, Mohamed and Henningson, Jann-Ole and Seidel, Hans-Peter and Castillo, Susana and Theobalt, Christian and Magnor, Marcus A.}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral) (CVPR 2021)}, PAGES = {1541--1550}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Kappel, Moritz %A Golyanik, Vladislav %A Elgharib, Mohamed %A Henningson, Jann-Ole %A Seidel, Hans-Peter %A Castillo, Susana %A Theobalt, Christian %A Magnor, Marcus A. %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Computer Graphics, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations %T High-Fidelity Neural Human Motion Transfer from Monocular Video Computer Vision and Pattern Recognition : %G eng %U http://hdl.handle.net/21.11116/0000-0008-8947-E %D 2021 %B 34th IEEE Conference on Computer Vision and Pattern Recognition %Z date of event: 2021-06-19 - 2021-06-25 %C Virtual Conference %B IEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral) %P 1541 - 1550 %I IEEE %U https://gvv.mpi-inf.mpg.de/projects/NHMT/
Liu, L., Xu, W., Habermann, M., et al. 2021a. Learning Dynamic Textures for Neural Rendering of Human Actors. IEEE Transactions on Visualization and Computer Graphics27, 10.
Export
BibTeX
@article{Liu2021, TITLE = {Learning Dynamic Textures for Neural Rendering of Human Actors}, AUTHOR = {Liu, Lingjie and Xu, Weipeng and Habermann, Marc and Zollh{\"o}fer, Michael and Bernard, Florian and Kim, Hyeongwoo and Wang, Wenping and Theobalt, Christian}, LANGUAGE = {eng}, ISSN = {1077-2626}, DOI = {10.1109/TVCG.2020.2996594}, PUBLISHER = {IEEE}, ADDRESS = {Piscataway, NJ}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, DATE = {2021}, JOURNAL = {IEEE Transactions on Visualization and Computer Graphics}, VOLUME = {27}, NUMBER = {10}, PAGES = {4009--4022}, }
Endnote
%0 Journal Article %A Liu, Lingjie %A Xu, Weipeng %A Habermann, Marc %A Zollhöfer, Michael %A Bernard, Florian %A Kim, Hyeongwoo %A Wang, Wenping %A Theobalt, Christian %+ External Organizations Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society External Organizations Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Learning Dynamic Textures for Neural Rendering of Human Actors : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4C96-9 %R 10.1109/TVCG.2020.2996594 %7 2021 %D 2021 %J IEEE Transactions on Visualization and Computer Graphics %V 27 %N 10 %& 4009 %P 4009 - 4022 %I IEEE %C Piscataway, NJ %@ false
Liu, L., Chen, N., Ceylan, D., Theobalt, C., Wang, W., and Mitra, N.J. 2021b. CurveFusion: Reconstructing Thin Structures from RGBD Sequences. https://arxiv.org/abs/2107.05284.
(arXiv: 2107.05284)
Abstract
We introduce CurveFusion, the first approach for high quality scanning of thin structures at interactive rates using a handheld RGBD camera. Thin filament-like structures are mathematically just 1D curves embedded in R^3, and integration-based reconstruction works best when depth sequences (from the thin structure parts) are fused using the object's (unknown) curve skeleton. Thus, using the complementary but noisy color and depth channels, CurveFusion first automatically identifies point samples on potential thin structures and groups them into bundles, each being a group of a fixed number of aligned consecutive frames. Then, the algorithm extracts per-bundle skeleton curves using L1 axes, and aligns and iteratively merges the L1 segments from all the bundles to form the final complete curve skeleton. Thus, unlike previous methods, reconstruction happens via integration along a data-dependent fusion primitive, i.e., the extracted curve skeleton. We extensively evaluate CurveFusion on a range of challenging examples, different scanner and calibration settings, and present high fidelity thin structure reconstructions previously just not possible from raw RGBD sequences.
Export
BibTeX
@online{Liu:2107.05284, TITLE = {{CurveFusion}: {R}econstructing Thin Structures from {RGBD} Sequences}, AUTHOR = {Liu, Lingjie and Chen, Nenglun and Ceylan, Duygu and Theobalt, Christian and Wang, Wenping and Mitra, Niloy J.}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2107.05284}, EPRINT = {2107.05284}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We introduce CurveFusion, the first approach for high quality scanning of thin structures at interactive rates using a handheld RGBD camera. Thin filament-like structures are mathematically just 1D curves embedded in R^3, and integration-based reconstruction works best when depth sequences (from the thin structure parts) are fused using the object's (unknown) curve skeleton. Thus, using the complementary but noisy color and depth channels, CurveFusion first automatically identifies point samples on potential thin structures and groups them into bundles, each being a group of a fixed number of aligned consecutive frames. Then, the algorithm extracts per-bundle skeleton curves using L1 axes, and aligns and iteratively merges the L1 segments from all the bundles to form the final complete curve skeleton. Thus, unlike previous methods, reconstruction happens via integration along a data-dependent fusion primitive, i.e., the extracted curve skeleton. We extensively evaluate CurveFusion on a range of challenging examples, different scanner and calibration settings, and present high fidelity thin structure reconstructions previously just not possible from raw RGBD sequences.}, }
Endnote
%0 Report %A Liu, Lingjie %A Chen, Nenglun %A Ceylan, Duygu %A Theobalt, Christian %A Wang, Wenping %A Mitra, Niloy J. %+ External Organizations External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations %T CurveFusion: Reconstructing Thin Structures from RGBD Sequences : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4FED-5 %U https://arxiv.org/abs/2107.05284 %D 2021 %X We introduce CurveFusion, the first approach for high quality scanning of thin structures at interactive rates using a handheld RGBD camera. Thin filament-like structures are mathematically just 1D curves embedded in R^3, and integration-based reconstruction works best when depth sequences (from the thin structure parts) are fused using the object's (unknown) curve skeleton. Thus, using the complementary but noisy color and depth channels, CurveFusion first automatically identifies point samples on potential thin structures and groups them into bundles, each being a group of a fixed number of aligned consecutive frames. Then, the algorithm extracts per-bundle skeleton curves using L1 axes, and aligns and iteratively merges the L1 segments from all the bundles to form the final complete curve skeleton. Thus, unlike previous methods, reconstruction happens via integration along a data-dependent fusion primitive, i.e., the extracted curve skeleton. We extensively evaluate CurveFusion on a range of challenging examples, different scanner and calibration settings, and present high fidelity thin structure reconstructions previously just not possible from raw RGBD sequences. %K Computer Science, Graphics, cs.GR
Liu, L., Habermann, M., Rudnev, V., Sarkar, K., Gu, J., and Theobalt, C. 2021c. Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control. https://arxiv.org/abs/2106.02019.
(arXiv: 2106.02019)
Abstract
We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is built upon recent neural scene representation and rendering works which learn representations of geometry and appearance from only 2D images. While existing works demonstrated compelling rendering of static scenes and playback of dynamic scenes, photo-realistic reconstruction and rendering of humans with neural implicit methods, in particular under user-controlled novel poses, is still difficult. To address this problem, we utilize a coarse body model as the proxy to unwarp the surrounding 3D space into a canonical pose. A neural radiance field learns pose-dependent geometric deformations and pose- and view-dependent appearance effects in the canonical space from multi-view video input. To synthesize novel views of high fidelity dynamic geometry and appearance, we leverage 2D texture maps defined on the body model as latent variables for predicting residual deformations and the dynamic appearance. Experiments demonstrate that our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses. Furthermore, our method also supports body shape control of the synthesized results.
Export
BibTeX
@online{Liu_arXiv2106.02019, TITLE = {Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control}, AUTHOR = {Liu, Lingjie and Habermann, Marc and Rudnev, Viktor and Sarkar, Kripasindhu and Gu, Jiatao and Theobalt, Christian}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2106.02019}, EPRINT = {2106.02019}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is built upon recent neural scene representation and rendering works which learn representations of geometry and appearance from only 2D images. While existing works demonstrated compelling rendering of static scenes and playback of dynamic scenes, photo-realistic reconstruction and rendering of humans with neural implicit methods, in particular under user-controlled novel poses, is still difficult. To address this problem, we utilize a coarse body model as the proxy to unwarp the surrounding 3D space into a canonical pose. A neural radiance field learns pose-dependent geometric deformations and pose- and view-dependent appearance effects in the canonical space from multi-view video input. To synthesize novel views of high fidelity dynamic geometry and appearance, we leverage 2D texture maps defined on the body model as latent variables for predicting residual deformations and the dynamic appearance. Experiments demonstrate that our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses. Furthermore, our method also supports body shape control of the synthesized results.}, }
Endnote
%0 Report %A Liu, Lingjie %A Habermann, Marc %A Rudnev, Viktor %A Sarkar, Kripasindhu %A Gu, Jiatao %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control : %G eng %U http://hdl.handle.net/21.11116/0000-0009-5320-5 %U https://arxiv.org/abs/2106.02019 %D 2021 %X We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is built upon recent neural scene representation and rendering works which learn representations of geometry and appearance from only 2D images. While existing works demonstrated compelling rendering of static scenes and playback of dynamic scenes, photo-realistic reconstruction and rendering of humans with neural implicit methods, in particular under user-controlled novel poses, is still difficult. To address this problem, we utilize a coarse body model as the proxy to unwarp the surrounding 3D space into a canonical pose. A neural radiance field learns pose-dependent geometric deformations and pose- and view-dependent appearance effects in the canonical space from multi-view video input. To synthesize novel views of high fidelity dynamic geometry and appearance, we leverage 2D texture maps defined on the body model as latent variables for predicting residual deformations and the dynamic appearance. Experiments demonstrate that our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses. Furthermore, our method also supports body shape control of the synthesized results. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Graphics, cs.GR,Computer Science, Learning, cs.LG
Liu, Y., Peng, S., Liu, L., et al. 2021d. Neural Rays for Occlusion-aware Image-based Rendering. https://arxiv.org/abs/2107.13421.
(arXiv: 2107.13421)
Abstract
We present a new neural representation, called Neural Ray (NeuRay), for the novel view synthesis (NVS) task with multi-view images as input. Existing neural scene representations for solving the NVS problem, such as NeRF, cannot generalize to new scenes and take an excessively long time on training on each new scene from scratch. The other subsequent neural rendering methods based on stereo matching, such as PixelNeRF, SRF and IBRNet are designed to generalize to unseen scenes but suffer from view inconsistency in complex scenes with self-occlusions. To address these issues, our NeuRay method represents every scene by encoding the visibility of rays associated with the input views. This neural representation can efficiently be initialized from depths estimated by external MVS methods, which is able to generalize to new scenes and achieves satisfactory rendering images without any training on the scene. Then, the initialized NeuRay can be further optimized on every scene with little training timing to enforce spatial coherence to ensure view consistency in the presence of severe self-occlusion. Experiments demonstrate that NeuRay can quickly generate high-quality novel view images of unseen scenes with little finetuning and can handle complex scenes with severe self-occlusions which previous methods struggle with.
Export
BibTeX
@online{Liu_arXiv2107.13421, TITLE = {Neural Rays for Occlusion-aware Image-based Rendering}, AUTHOR = {Liu, Yuan and Peng, Sida and Liu, Lingjie and Wang, Qianqian and Wang, Peng and Theobalt, Christian and Zhou, Xiaowei and Wang, Wenping}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2107.13421}, EPRINT = {2107.13421}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We present a new neural representation, called Neural Ray (NeuRay), for the novel view synthesis (NVS) task with multi-view images as input. Existing neural scene representations for solving the NVS problem, such as NeRF, cannot generalize to new scenes and take an excessively long time on training on each new scene from scratch. The other subsequent neural rendering methods based on stereo matching, such as PixelNeRF, SRF and IBRNet are designed to generalize to unseen scenes but suffer from view inconsistency in complex scenes with self-occlusions. To address these issues, our NeuRay method represents every scene by encoding the visibility of rays associated with the input views. This neural representation can efficiently be initialized from depths estimated by external MVS methods, which is able to generalize to new scenes and achieves satisfactory rendering images without any training on the scene. Then, the initialized NeuRay can be further optimized on every scene with little training timing to enforce spatial coherence to ensure view consistency in the presence of severe self-occlusion. Experiments demonstrate that NeuRay can quickly generate high-quality novel view images of unseen scenes with little finetuning and can handle complex scenes with severe self-occlusions which previous methods struggle with.}, }
Endnote
%0 Report %A Liu, Yuan %A Peng, Sida %A Liu, Lingjie %A Wang, Qianqian %A Wang, Peng %A Theobalt, Christian %A Zhou, Xiaowei %A Wang, Wenping %+ External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Neural Rays for Occlusion-aware Image-based Rendering : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4D32-9 %U https://arxiv.org/abs/2107.13421 %D 2021 %X We present a new neural representation, called Neural Ray (NeuRay), for the novel view synthesis (NVS) task with multi-view images as input. Existing neural scene representations for solving the NVS problem, such as NeRF, cannot generalize to new scenes and take an excessively long time on training on each new scene from scratch. The other subsequent neural rendering methods based on stereo matching, such as PixelNeRF, SRF and IBRNet are designed to generalize to unseen scenes but suffer from view inconsistency in complex scenes with self-occlusions. To address these issues, our NeuRay method represents every scene by encoding the visibility of rays associated with the input views. This neural representation can efficiently be initialized from depths estimated by external MVS methods, which is able to generalize to new scenes and achieves satisfactory rendering images without any training on the scene. Then, the initialized NeuRay can be further optimized on every scene with little training timing to enforce spatial coherence to ensure view consistency in the presence of severe self-occlusion. Experiments demonstrate that NeuRay can quickly generate high-quality novel view images of unseen scenes with little finetuning and can handle complex scenes with severe self-occlusions which previous methods struggle with. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Graphics, cs.GR
Long, X., Liu, L., Li, W., Theobalt, C., and Wang, W. Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{Long_CVPR2021b, TITLE = {Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks}, AUTHOR = {Long, Xiaoxiao and Liu, Lingjie and Li, Wei and Theobalt, Christian and Wang, Wenping}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021)}, PAGES = {8258--8267}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Long, Xiaoxiao %A Liu, Lingjie %A Li, Wei %A Theobalt, Christian %A Wang, Wenping %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations %T Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4CFE-5 %D 2021 %B 34th IEEE Conference on Computer Vision and Pattern Recognition %Z date of event: 2021-06-19 - 2021-06-25 %C Virtual Conference %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %P 8258 - 8267 %I IEEE
Long, X., Lin, C., Liu, L., et al. 2021. Adaptive Surface Normal Constraint for Depth Estimation. https://arxiv.org/abs/2103.15483.
(arXiv: 2103.15483)
Abstract
We present a novel method for single image depth estimation using surface normal constraints. Existing depth estimation methods either suffer from the lack of geometric constraints, or are limited to the difficulty of reliably capturing geometric context, which leads to a bottleneck of depth estimation quality. We therefore introduce a simple yet effective method, named Adaptive Surface Normal (ASN) constraint, to effectively correlate the depth estimation with geometric consistency. Our key idea is to adaptively determine the reliable local geometry from a set of randomly sampled candidates to derive surface normal constraint, for which we measure the consistency of the geometric contextual features. As a result, our method can faithfully reconstruct the 3D geometry and is robust to local shape variations, such as boundaries, sharp corners and noises. We conduct extensive evaluations and comparisons using public datasets. The experimental results demonstrate our method outperforms the state-of-the-art methods and has superior efficiency and robustness.
Export
BibTeX
@online{Long_arXiv2103.15483, TITLE = {Adaptive Surface Normal Constraint for Depth Estimation}, AUTHOR = {Long, Xiaoxiao and Lin, Cheng and Liu, Lingjie and Li, Wei and Theobalt, Christian and Yang, Ruigang and Wang, Wenping}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2103.15483}, EPRINT = {2103.15483}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We present a novel method for single image depth estimation using surface normal constraints. Existing depth estimation methods either suffer from the lack of geometric constraints, or are limited to the difficulty of reliably capturing geometric context, which leads to a bottleneck of depth estimation quality. We therefore introduce a simple yet effective method, named Adaptive Surface Normal (ASN) constraint, to effectively correlate the depth estimation with geometric consistency. Our key idea is to adaptively determine the reliable local geometry from a set of randomly sampled candidates to derive surface normal constraint, for which we measure the consistency of the geometric contextual features. As a result, our method can faithfully reconstruct the 3D geometry and is robust to local shape variations, such as boundaries, sharp corners and noises. We conduct extensive evaluations and comparisons using public datasets. The experimental results demonstrate our method outperforms the state-of-the-art methods and has superior efficiency and robustness.}, }
Endnote
%0 Report %A Long, Xiaoxiao %A Lin, Cheng %A Liu, Lingjie %A Li, Wei %A Theobalt, Christian %A Yang, Ruigang %A Wang, Wenping %+ External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Adaptive Surface Normal Constraint for Depth Estimation : %G eng %U http://hdl.handle.net/21.11116/0000-0009-5332-1 %U https://arxiv.org/abs/2103.15483 %D 2021 %X We present a novel method for single image depth estimation using surface normal constraints. Existing depth estimation methods either suffer from the lack of geometric constraints, or are limited to the difficulty of reliably capturing geometric context, which leads to a bottleneck of depth estimation quality. We therefore introduce a simple yet effective method, named Adaptive Surface Normal (ASN) constraint, to effectively correlate the depth estimation with geometric consistency. Our key idea is to adaptively determine the reliable local geometry from a set of randomly sampled candidates to derive surface normal constraint, for which we measure the consistency of the geometric contextual features. As a result, our method can faithfully reconstruct the 3D geometry and is robust to local shape variations, such as boundaries, sharp corners and noises. We conduct extensive evaluations and comparisons using public datasets. The experimental results demonstrate our method outperforms the state-of-the-art methods and has superior efficiency and robustness. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Lyu, L., Habermann, M., Liu, L., Mallikarjun B R, Tewari, A., and Theobalt, C. 2021. Efficient and Differentiable Shadow Computation for Inverse Problems. https://arxiv.org/abs/2104.00359.
(arXiv: 2104.00359)
Abstract
Differentiable rendering has received increasing interest for image-based inverse problems. It can benefit traditional optimization-based solutions to inverse problems, but also allows for self-supervision of learning-based approaches for which training data with ground truth annotation is hard to obtain. However, existing differentiable renderers either do not model visibility of the light sources from the different points in the scene, responsible for shadows in the images, or are too slow for being used to train deep architectures over thousands of iterations. To this end, we propose an accurate yet efficient approach for differentiable visibility and soft shadow computation. Our approach is based on the spherical harmonics approximations of the scene illumination and visibility, where the occluding surface is approximated with spheres. This allows for a significantly more efficient shadow computation compared to methods based on ray tracing. As our formulation is differentiable, it can be used to solve inverse problems such as texture, illumination, rigid pose, and geometric deformation recovery from images using analysis-by-synthesis optimization.
Export
BibTeX
@online{Lyu_arXiv2104.00359, TITLE = {Efficient and Differentiable Shadow Computation for Inverse Problems}, AUTHOR = {Lyu, Linjie and Habermann, Marc and Liu, Lingjie and Mallikarjun B R, and Tewari, Ayush and Theobalt, Christian}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2104.00359}, EPRINT = {2104.00359}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Differentiable rendering has received increasing interest for image-based inverse problems. It can benefit traditional optimization-based solutions to inverse problems, but also allows for self-supervision of learning-based approaches for which training data with ground truth annotation is hard to obtain. However, existing differentiable renderers either do not model visibility of the light sources from the different points in the scene, responsible for shadows in the images, or are too slow for being used to train deep architectures over thousands of iterations. To this end, we propose an accurate yet efficient approach for differentiable visibility and soft shadow computation. Our approach is based on the spherical harmonics approximations of the scene illumination and visibility, where the occluding surface is approximated with spheres. This allows for a significantly more efficient shadow computation compared to methods based on ray tracing. As our formulation is differentiable, it can be used to solve inverse problems such as texture, illumination, rigid pose, and geometric deformation recovery from images using analysis-by-synthesis optimization.}, }
Endnote
%0 Report %A Lyu, Linjie %A Habermann, Marc %A Liu, Lingjie %A Mallikarjun B R, %A Tewari, Ayush %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Efficient and Differentiable Shadow Computation for Inverse Problems : %G eng %U http://hdl.handle.net/21.11116/0000-0009-532F-6 %U https://arxiv.org/abs/2104.00359 %D 2021 %X Differentiable rendering has received increasing interest for image-based inverse problems. It can benefit traditional optimization-based solutions to inverse problems, but also allows for self-supervision of learning-based approaches for which training data with ground truth annotation is hard to obtain. However, existing differentiable renderers either do not model visibility of the light sources from the different points in the scene, responsible for shadows in the images, or are too slow for being used to train deep architectures over thousands of iterations. To this end, we propose an accurate yet efficient approach for differentiable visibility and soft shadow computation. Our approach is based on the spherical harmonics approximations of the scene illumination and visibility, where the occluding surface is approximated with spheres. This allows for a significantly more efficient shadow computation compared to methods based on ray tracing. As our formulation is differentiable, it can be used to solve inverse problems such as texture, illumination, rigid pose, and geometric deformation recovery from images using analysis-by-synthesis optimization. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Mallikarjun B R, Tewari, A., Seidel, H.-P., Elgharib, M., and Theobalt, C. Learning Complete 3D Morphable Face Models from Images and Videos. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{Mallikarjun_CVPR2021b, TITLE = {Learning Complete {3D} Morphable Face Models from Images and Videos}, AUTHOR = {Mallikarjun B R, and Tewari, Ayush and Seidel, Hans-Peter and Elgharib, Mohamed and Theobalt, Christian}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021)}, PAGES = {3361--3371}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Mallikarjun B R, %A Tewari, Ayush %A Seidel, Hans-Peter %A Elgharib, Mohamed %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Learning Complete 3D Morphable Face Models from Images and Videos : %G eng %U http://hdl.handle.net/21.11116/0000-0008-8926-3 %D 2021 %B 34th IEEE Conference on Computer Vision and Pattern Recognition %Z date of event: 2021-06-19 - 2021-06-25 %C Virtual Conference %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %P 3361 - 3371 %I IEEE %U https://gvv.mpi-inf.mpg.de/projects/LeMoMo/
Mallikarjun B R, Tewari, A., Oh, T.-H., et al. Monocular Reconstruction of Neural Face Reflectance Fields. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{Mallikarjun_CVPR2021, TITLE = {Monocular Reconstruction of Neural Face Reflectance Fields}, AUTHOR = {Mallikarjun B R, and Tewari, Ayush and Oh, Tae-Hyun and Weyrich, Tim and Bickel, Bernd and Seidel, Hans-Peter and Pfister, Hanspeter and Matusik, Wojciech and Elgharib, Mohamed and Theobalt, Christian}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021)}, PAGES = {4791--4800}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Mallikarjun B R, %A Tewari, Ayush %A Oh, Tae-Hyun %A Weyrich, Tim %A Bickel, Bernd %A Seidel, Hans-Peter %A Pfister, Hanspeter %A Matusik, Wojciech %A Elgharib, Mohamed %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations Computer Graphics, MPI for Informatics, Max Planck Society External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Monocular Reconstruction of Neural Face Reflectance Fields : %G eng %U http://hdl.handle.net/21.11116/0000-0008-88FB-4 %D 2021 %B 34th IEEE Conference on Computer Vision and Pattern Recognition %Z date of event: 2021-06-19 - 2021-06-25 %C Virtual Conference %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %P 4791 - 4800 %I IEEE %U https://gvv.mpi-inf.mpg.de/projects/FaceReflectanceFields/
Mallikarjun B R, Tewari, A., Dib, A., et al. 2021. PhotoApp: Photorealistic Appearance Editing of Head Portraits. ACM Transactions on Graphics (Proc. ACM SIGGRAPH 2021)40, 4.
Export
BibTeX
@article{MallikarjunBR2021, TITLE = {{PhotoApp}: {P}hotorealistic Appearance Editing of Head Portraits}, AUTHOR = {Mallikarjun B R, and Tewari, Ayush and Dib, Abdallah and Weyrich, Tim and Bickel, Bernd and Seidel, Hans-Peter and Pfister, Hanspeter and Matusik, Wojciech and Chevallier, Louis and Elgharib, Mohamed and Theobalt, Christian}, LANGUAGE = {eng}, ISSN = {0730-0301}, DOI = {10.1145/3450626.3459765}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM Transactions on Graphics (Proc. ACM SIGGRAPH)}, VOLUME = {40}, NUMBER = {4}, PAGES = {1--16}, EID = {44}, BOOKTITLE = {Proceedings of ACM SIGGRAPH 2021}, }
Endnote
%0 Journal Article %A Mallikarjun B R, %A Tewari, Ayush %A Dib, Abdallah %A Weyrich, Tim %A Bickel, Bernd %A Seidel, Hans-Peter %A Pfister, Hanspeter %A Matusik, Wojciech %A Chevallier, Louis %A Elgharib, Mohamed %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations Computer Graphics, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T PhotoApp: Photorealistic Appearance Editing of Head Portraits : %G eng %U http://hdl.handle.net/21.11116/0000-0009-2A9B-A %R 10.1145/3450626.3459765 %7 2021 %D 2021 %J ACM Transactions on Graphics %V 40 %N 4 %& 1 %P 1 - 16 %Z sequence number: 44 %I ACM %C New York, NY %@ false %B Proceedings of ACM SIGGRAPH 2021 %O ACM SIGGRAPH 2021
Meka, A., Shafiei, M., Zollhöfer, M., Richardt, C., and Theobalt, C. 2021. Real-time Global Illumination Decomposition of Videos. ACM Transactions on Graphics (Proc. ACM SIGGRAPH 2021)40, 4.
Export
BibTeX
@article{Meka:2021, TITLE = {Real-time Global Illumination Decomposition of Videos}, AUTHOR = {Meka, Abhimitra and Shafiei, Mohammad and Zollh{\"o}fer, Michael and Richardt, Christian and Theobalt, Christian}, LANGUAGE = {eng}, ISSN = {0730-0301}, DOI = {10.1145/3374753}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM Transactions on Graphics (Proc. ACM SIGGRAPH)}, VOLUME = {40}, NUMBER = {4}, PAGES = {1--16}, EID = {22}, BOOKTITLE = {Proceedings of ACM SIGGRAPH 2021}, }
Endnote
%0 Journal Article %A Meka, Abhimitra %A Shafiei, Mohammad %A Zollhöfer, Michael %A Richardt, Christian %A Theobalt, Christian %+ Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Real-time Global Illumination Decomposition of Videos : %G eng %U http://hdl.handle.net/21.11116/0000-0007-EE07-6 %R 10.1145/3374753 %7 2021 %D 2021 %J ACM Transactions on Graphics %V 40 %N 4 %& 1 %P 1 - 16 %Z sequence number: 22 %I ACM %C New York, NY %@ false %B Proceedings of ACM SIGGRAPH 2021 %O ACM SIGGRAPH 2021 %U http://gvv.mpi-inf.mpg.de/projects/LiveIlluminationDecomposition/
Nehvi, J., Golyanik, V., Mueller, F., Seidel, H.-P., Elgharib, M., and Theobalt, C. 2021. Differentiable Event Stream Simulator for Non-Rigid 3D Tracking. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR 2021), IEEE.
Export
BibTeX
@inproceedings{Nehvi_CVPR2021Workshop, TITLE = {Differentiable Event Stream Simulator for Non-Rigid {3D} Tracking}, AUTHOR = {Nehvi, Jalees and Golyanik, Vladislav and Mueller, Franziska and Seidel, Hans-Peter and Elgharib, Mohamed and Theobalt, Christian}, LANGUAGE = {eng}, ISBN = {978-1-6654-4899-4}, DOI = {10.1109/CVPRW53098.2021.00143}, PUBLISHER = {IEEE}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR 2021)}, PAGES = {1302--1311}, ADDRESS = {Virtual Workshop}, }
Endnote
%0 Conference Proceedings %A Nehvi, Jalees %A Golyanik, Vladislav %A Mueller, Franziska %A Seidel, Hans-Peter %A Elgharib, Mohamed %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Computer Graphics, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Differentiable Event Stream Simulator for Non-Rigid 3D Tracking : %G eng %U http://hdl.handle.net/21.11116/0000-0008-8957-C %R 10.1109/CVPRW53098.2021.00143 %D 2021 %B Third International Workshop on Event-Based Vision %Z date of event: 2021-06-19 - 2021-06-19 %C Virtual Workshop %B Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops %P 1302 - 1311 %I IEEE %@ 978-1-6654-4899-4 %U https://gvv.mpi-inf.mpg.de/projects/Event-based_Non-rigid_3D_Tracking/
Sarkar, K., Mehta, D., Xu, W., Golyanik, V., and Theobalt, C. 2021a. Neural Re-Rendering of Humans from a Single Image. https://arxiv.org/abs/2101.04104.
(arXiv: 2101.04104)
Abstract
Human re-rendering from a single image is a starkly under-constrained problem, and state-of-the-art algorithms often exhibit undesired artefacts, such as over-smoothing, unrealistic distortions of the body parts and garments, or implausible changes of the texture. To address these challenges, we propose a new method for neural re-rendering of a human under a novel user-defined pose and viewpoint, given one input image. Our algorithm represents body pose and shape as a parametric mesh which can be reconstructed from a single image and easily reposed. Instead of a colour-based UV texture map, our approach further employs a learned high-dimensional UV feature map to encode appearance. This rich implicit representation captures detailed appearance variation across poses, viewpoints, person identities and clothing styles better than learned colour texture maps. The body model with the rendered feature maps is fed through a neural image-translation network that creates the final rendered colour image. The above components are combined in an end-to-end-trained neural network architecture that takes as input a source person image, and images of the parametric body model in the source pose and desired target pose. Experimental evaluation demonstrates that our approach produces higher quality single image re-rendering results than existing methods.
Export
BibTeX
@online{Sarkar_arXiv2101.04104, TITLE = {Neural Re-Rendering of Humans from a Single Image}, AUTHOR = {Sarkar, Kripasindhu and Mehta, Dushyant and Xu, Weipeng and Golyanik, Vladislav and Theobalt, Christian}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2101.04104}, EPRINT = {2101.04104}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Human re-rendering from a single image is a starkly under-constrained problem, and state-of-the-art algorithms often exhibit undesired artefacts, such as over-smoothing, unrealistic distortions of the body parts and garments, or implausible changes of the texture. To address these challenges, we propose a new method for neural re-rendering of a human under a novel user-defined pose and viewpoint, given one input image. Our algorithm represents body pose and shape as a parametric mesh which can be reconstructed from a single image and easily reposed. Instead of a colour-based UV texture map, our approach further employs a learned high-dimensional UV feature map to encode appearance. This rich implicit representation captures detailed appearance variation across poses, viewpoints, person identities and clothing styles better than learned colour texture maps. The body model with the rendered feature maps is fed through a neural image-translation network that creates the final rendered colour image. The above components are combined in an end-to-end-trained neural network architecture that takes as input a source person image, and images of the parametric body model in the source pose and desired target pose. Experimental evaluation demonstrates that our approach produces higher quality single image re-rendering results than existing methods.}, }
Endnote
%0 Report %A Sarkar, Kripasindhu %A Mehta, Dushyant %A Xu, Weipeng %A Golyanik, Vladislav %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Neural Re-Rendering of Humans from a Single Image : %G eng %U http://hdl.handle.net/21.11116/0000-0007-CF05-B %U https://arxiv.org/abs/2101.04104 %D 2021 %X Human re-rendering from a single image is a starkly under-constrained problem, and state-of-the-art algorithms often exhibit undesired artefacts, such as over-smoothing, unrealistic distortions of the body parts and garments, or implausible changes of the texture. To address these challenges, we propose a new method for neural re-rendering of a human under a novel user-defined pose and viewpoint, given one input image. Our algorithm represents body pose and shape as a parametric mesh which can be reconstructed from a single image and easily reposed. Instead of a colour-based UV texture map, our approach further employs a learned high-dimensional UV feature map to encode appearance. This rich implicit representation captures detailed appearance variation across poses, viewpoints, person identities and clothing styles better than learned colour texture maps. The body model with the rendered feature maps is fed through a neural image-translation network that creates the final rendered colour image. The above components are combined in an end-to-end-trained neural network architecture that takes as input a source person image, and images of the parametric body model in the source pose and desired target pose. Experimental evaluation demonstrates that our approach produces higher quality single image re-rendering results than existing methods. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Sarkar, K., Liu, L., Golyanik, V., and Theobalt, C. 2021b. HumanGAN: A Generative Model of Humans Images. https://arxiv.org/abs/2103.06902.
(arXiv: 2103.06902)
Abstract
Generative adversarial networks achieve great performance in photorealistic image synthesis in various domains, including human images. However, they usually employ latent vectors that encode the sampled outputs globally. This does not allow convenient control of semantically-relevant individual parts of the image, and is not able to draw samples that only differ in partial aspects, such as clothing style. We address these limitations and present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style. This is the first method to solve various aspects of human image generation such as global appearance sampling, pose transfer, parts and garment transfer, and parts sampling jointly in a unified framework. As our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture. Experiments show that our flexible and general generative method outperforms task-specific baselines for pose-conditioned image generation, pose transfer and part sampling in terms of realism and output resolution.
Export
BibTeX
@online{Sarkar_arXiv2103.06902, TITLE = {{HumanGAN}: A Generative Model of Humans Images}, AUTHOR = {Sarkar, Kripasindhu and Liu, Lingjie and Golyanik, Vladislav and Theobalt, Christian}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2103.06902}, EPRINT = {2103.06902}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Generative adversarial networks achieve great performance in photorealistic image synthesis in various domains, including human images. However, they usually employ latent vectors that encode the sampled outputs globally. This does not allow convenient control of semantically-relevant individual parts of the image, and is not able to draw samples that only differ in partial aspects, such as clothing style. We address these limitations and present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style. This is the first method to solve various aspects of human image generation such as global appearance sampling, pose transfer, parts and garment transfer, and parts sampling jointly in a unified framework. As our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture. Experiments show that our flexible and general generative method outperforms task-specific baselines for pose-conditioned image generation, pose transfer and part sampling in terms of realism and output resolution.}, }
Endnote
%0 Report %A Sarkar, Kripasindhu %A Liu, Lingjie %A Golyanik, Vladislav %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T HumanGAN: A Generative Model of Humans Images : %G eng %U http://hdl.handle.net/21.11116/0000-0009-5356-9 %U https://arxiv.org/abs/2103.06902 %D 2021 %X Generative adversarial networks achieve great performance in photorealistic image synthesis in various domains, including human images. However, they usually employ latent vectors that encode the sampled outputs globally. This does not allow convenient control of semantically-relevant individual parts of the image, and is not able to draw samples that only differ in partial aspects, such as clothing style. We address these limitations and present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style. This is the first method to solve various aspects of human image generation such as global appearance sampling, pose transfer, parts and garment transfer, and parts sampling jointly in a unified framework. As our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture. Experiments show that our flexible and general generative method outperforms task-specific baselines for pose-conditioned image generation, pose transfer and part sampling in terms of realism and output resolution. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Sarkar, K., Golyanik, V., Liu, L., and Theobalt, C. 2021c. Style and Pose Control for Image Synthesis of Humans from a Single Monocular View. https://arxiv.org/abs/2102.11263.
(arXiv: 2102.11263)
Abstract
Photo-realistic re-rendering of a human from a single image with explicit control over body pose, shape and appearance enables a wide range of applications, such as human appearance transfer, virtual try-on, motion imitation, and novel view synthesis. While significant progress has been made in this direction using learning-based image generation tools, such as GANs, existing approaches yield noticeable artefacts such as blurring of fine details, unrealistic distortions of the body parts and garments as well as severe changes of the textures. We, therefore, propose a new method for synthesising photo-realistic human images with explicit control over pose and part-based appearance, i.e., StylePoseGAN, where we extend a non-controllable generator to accept conditioning of pose and appearance separately. Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts, and it significantly outperforms existing single image re-rendering methods. Our disentangled representation opens up further applications such as garment transfer, motion transfer, virtual try-on, head (identity) swap and appearance interpolation. StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics compared to the current best-performing methods and convinces in a comprehensive user study.
Export
BibTeX
@online{Sarkar_arXiv2102.11263, TITLE = {Style and Pose Control for Image Synthesis of Humans from a Single Monocular View}, AUTHOR = {Sarkar, Kripasindhu and Golyanik, Vladislav and Liu, Lingjie and Theobalt, Christian}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2102.11263}, EPRINT = {2102.11263}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Photo-realistic re-rendering of a human from a single image with explicit control over body pose, shape and appearance enables a wide range of applications, such as human appearance transfer, virtual try-on, motion imitation, and novel view synthesis. While significant progress has been made in this direction using learning-based image generation tools, such as GANs, existing approaches yield noticeable artefacts such as blurring of fine details, unrealistic distortions of the body parts and garments as well as severe changes of the textures. We, therefore, propose a new method for synthesising photo-realistic human images with explicit control over pose and part-based appearance, i.e., StylePoseGAN, where we extend a non-controllable generator to accept conditioning of pose and appearance separately. Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts, and it significantly outperforms existing single image re-rendering methods. Our disentangled representation opens up further applications such as garment transfer, motion transfer, virtual try-on, head (identity) swap and appearance interpolation. StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics compared to the current best-performing methods and convinces in a comprehensive user study.}, }
Endnote
%0 Report %A Sarkar, Kripasindhu %A Golyanik, Vladislav %A Liu, Lingjie %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Style and Pose Control for Image Synthesis of Humans from a Single Monocular View : %G eng %U http://hdl.handle.net/21.11116/0000-0009-53BB-7 %U https://arxiv.org/abs/2102.11263 %D 2021 %X Photo-realistic re-rendering of a human from a single image with explicit control over body pose, shape and appearance enables a wide range of applications, such as human appearance transfer, virtual try-on, motion imitation, and novel view synthesis. While significant progress has been made in this direction using learning-based image generation tools, such as GANs, existing approaches yield noticeable artefacts such as blurring of fine details, unrealistic distortions of the body parts and garments as well as severe changes of the textures. We, therefore, propose a new method for synthesising photo-realistic human images with explicit control over pose and part-based appearance, i.e., StylePoseGAN, where we extend a non-controllable generator to accept conditioning of pose and appearance separately. Our network can be trained in a fully supervised way with human images to disentangle pose, appearance and body parts, and it significantly outperforms existing single image re-rendering methods. Our disentangled representation opens up further applications such as garment transfer, motion transfer, virtual try-on, head (identity) swap and appearance interpolation. StylePoseGAN achieves state-of-the-art image generation fidelity on common perceptual metrics compared to the current best-performing methods and convinces in a comprehensive user study. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Seelbach Benkner, M., Golyanik, V., Theobalt, C., and Moeller, M. 2021. Adiabatic Quantum Graph Matching with Permutation Matrix Constraints. https://arxiv.org/abs/2107.04032.
(arXiv: 2107.04032)
Abstract
Matching problems on 3D shapes and images are challenging as they are frequently formulated as combinatorial quadratic assignment problems (QAPs) with permutation matrix constraints, which are NP-hard. In this work, we address such problems with emerging quantum computing technology and propose several reformulations of QAPs as unconstrained problems suitable for efficient execution on quantum hardware. We investigate several ways to inject permutation matrix constraints in a quadratic unconstrained binary optimization problem which can be mapped to quantum hardware. We focus on obtaining a sufficient spectral gap, which further increases the probability to measure optimal solutions and valid permutation matrices in a single run. We perform our experiments on the quantum computer D-Wave 2000Q (2^11 qubits, adiabatic). Despite the observed discrepancy between simulated adiabatic quantum computing and execution on real quantum hardware, our reformulation of permutation matrix constraints increases the robustness of the numerical computations over other penalty approaches in our experiments. The proposed algorithm has the potential to scale to higher dimensions on future quantum computing architectures, which opens up multiple new directions for solving matching problems in 3D computer vision and graphics.
Export
BibTeX
@online{Seelbach_arXiv2107.04032, TITLE = {Adiabatic Quantum Graph Matching with Permutation Matrix Constraints}, AUTHOR = {Seelbach Benkner, Marcel and Golyanik, Vladislav and Theobalt, Christian and Moeller, Michael}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2107.04032}, EPRINT = {2107.04032}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Matching problems on 3D shapes and images are challenging as they are frequently formulated as combinatorial quadratic assignment problems (QAPs) with permutation matrix constraints, which are NP-hard. In this work, we address such problems with emerging quantum computing technology and propose several reformulations of QAPs as unconstrained problems suitable for efficient execution on quantum hardware. We investigate several ways to inject permutation matrix constraints in a quadratic unconstrained binary optimization problem which can be mapped to quantum hardware. We focus on obtaining a sufficient spectral gap, which further increases the probability to measure optimal solutions and valid permutation matrices in a single run. We perform our experiments on the quantum computer D-Wave 2000Q (2^11 qubits, adiabatic). Despite the observed discrepancy between simulated adiabatic quantum computing and execution on real quantum hardware, our reformulation of permutation matrix constraints increases the robustness of the numerical computations over other penalty approaches in our experiments. The proposed algorithm has the potential to scale to higher dimensions on future quantum computing architectures, which opens up multiple new directions for solving matching problems in 3D computer vision and graphics.}, }
Endnote
%0 Report %A Seelbach Benkner, Marcel %A Golyanik, Vladislav %A Theobalt, Christian %A Moeller, Michael %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations %T Adiabatic Quantum Graph Matching with Permutation Matrix Constraints : %G eng %U http://hdl.handle.net/21.11116/0000-0009-525C-4 %U https://arxiv.org/abs/2107.04032 %D 2021 %X Matching problems on 3D shapes and images are challenging as they are frequently formulated as combinatorial quadratic assignment problems (QAPs) with permutation matrix constraints, which are NP-hard. In this work, we address such problems with emerging quantum computing technology and propose several reformulations of QAPs as unconstrained problems suitable for efficient execution on quantum hardware. We investigate several ways to inject permutation matrix constraints in a quadratic unconstrained binary optimization problem which can be mapped to quantum hardware. We focus on obtaining a sufficient spectral gap, which further increases the probability to measure optimal solutions and valid permutation matrices in a single run. We perform our experiments on the quantum computer D-Wave 2000Q (2^11 qubits, adiabatic). Despite the observed discrepancy between simulated adiabatic quantum computing and execution on real quantum hardware, our reformulation of permutation matrix constraints increases the robustness of the numerical computations over other penalty approaches in our experiments. The proposed algorithm has the potential to scale to higher dimensions on future quantum computing architectures, which opens up multiple new directions for solving matching problems in 3D computer vision and graphics. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Seelbach Benkner, M., Lähner, Z., Golyanik, V., Wunderlich, C., Theobalt, C., and Moeller, M. Q-Match: Iterative Shape Matching via Quantum Annealing. International Conference on Computer Vision (ICCV 2021), IEEE.
(Accepted/in press)
Abstract
Finding shape correspondences can be formulated as an NP-hard quadratic assignment problem (QAP) that becomes infeasible for shapes with high sampling density. A promising research direction is to tackle such quadratic optimization problems over binary variables with quantum annealing, which allows for some problems a more efficient search in the solution space. Unfortunately, enforcing the linear equality constraints in QAPs via a penalty significantly limits the success probability of such methods on currently available quantum hardware. To address this limitation, this paper proposes Q-Match, i.e., a new iterative quantum method for QAPs inspired by the alpha-expansion algorithm, which allows solving problems of an order of magnitude larger than current quantum methods. It implicitly enforces the QAP constraints by updating the current estimates in a cyclic fashion. Further, Q-Match can be applied iteratively, on a subset of well-chosen correspondences, allowing us to scale to real-world problems. Using the latest quantum annealer, the D-Wave Advantage, we evaluate the proposed method on a subset of QAPLIB as well as on isometric shape matching problems from the FAUST dataset.
Export
BibTeX
@inproceedings{Seelbach_ICCV2021, TITLE = {Q-Match: Iterative Shape Matching via Quantum Annealing}, AUTHOR = {Seelbach Benkner, Marcel and L{\"a}hner, Zorah and Golyanik, Vladislav and Wunderlich, Christof and Theobalt, Christian and Moeller, Michael}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Finding shape correspondences can be formulated as an NP-hard quadratic assignment problem (QAP) that becomes infeasible for shapes with high sampling density. A promising research direction is to tackle such quadratic optimization problems over binary variables with quantum annealing, which allows for some problems a more efficient search in the solution space. Unfortunately, enforcing the linear equality constraints in QAPs via a penalty significantly limits the success probability of such methods on currently available quantum hardware. To address this limitation, this paper proposes Q-Match, i.e., a new iterative quantum method for QAPs inspired by the alpha-expansion algorithm, which allows solving problems of an order of magnitude larger than current quantum methods. It implicitly enforces the QAP constraints by updating the current estimates in a cyclic fashion. Further, Q-Match can be applied iteratively, on a subset of well-chosen correspondences, allowing us to scale to real-world problems. Using the latest quantum annealer, the D-Wave Advantage, we evaluate the proposed method on a subset of QAPLIB as well as on isometric shape matching problems from the FAUST dataset.}, BOOKTITLE = {International Conference on Computer Vision (ICCV 2021)}, ADDRESS = {Virtual}, }
Endnote
%0 Conference Proceedings %A Seelbach Benkner, Marcel %A Lähner, Zorah %A Golyanik, Vladislav %A Wunderlich, Christof %A Theobalt, Christian %A Moeller, Michael %+ External Organizations External Organizations External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations %T Q-Match: Iterative Shape Matching via Quantum Annealing : %G eng %U http://hdl.handle.net/21.11116/0000-0009-5328-D %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual %X Finding shape correspondences can be formulated as an NP-hard quadratic assignment problem (QAP) that becomes infeasible for shapes with high sampling density. A promising research direction is to tackle such quadratic optimization problems over binary variables with quantum annealing, which allows for some problems a more efficient search in the solution space. Unfortunately, enforcing the linear equality constraints in QAPs via a penalty significantly limits the success probability of such methods on currently available quantum hardware. To address this limitation, this paper proposes Q-Match, i.e., a new iterative quantum method for QAPs inspired by the alpha-expansion algorithm, which allows solving problems of an order of magnitude larger than current quantum methods. It implicitly enforces the QAP constraints by updating the current estimates in a cyclic fashion. Further, Q-Match can be applied iteratively, on a subset of well-chosen correspondences, allowing us to scale to real-world problems. Using the latest quantum annealer, the D-Wave Advantage, we evaluate the proposed method on a subset of QAPLIB as well as on isometric shape matching problems from the FAUST dataset. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV %B International Conference on Computer Vision %I IEEE %U https://4dqv.mpi-inf.mpg.de/QMATCH/
Shimada, S., Golyanik, V., Xu, W., Pérez, P., and Theobalt, C. 2021. Neural Monocular 3D Human Motion Capture with Physical Awareness. ACM Transactions on Graphics (Proc. ACM SIGGRAPH 2021)40, 4.
Export
BibTeX
@article{Shimada2021, TITLE = {Neural Monocular {3D} Human Motion Capture with Physical Awareness}}, AUTHOR = {Shimada, Soshi and Golyanik, Vladislav and Xu, Weipeng and P{\'e}rez, Patrick and Theobalt, Christian}, LANGUAGE = {eng}, ISSN = {0730-0301}, DOI = {10.1145/3450626.3459825}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM Transactions on Graphics (Proc. ACM SIGGRAPH)}, VOLUME = {40}, NUMBER = {4}, PAGES = {1--15}, EID = {83}, BOOKTITLE = {Proceedings of ACM SIGGRAPH 2021}, }
Endnote
%0 Journal Article %A Shimada, Soshi %A Golyanik, Vladislav %A Xu, Weipeng %A Pérez, Patrick %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Neural Monocular 3D Human Motion Capture with Physical Awareness : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4B80-2 %R 10.1145/3450626.3459825 %7 2021 %D 2021 %J ACM Transactions on Graphics %V 40 %N 4 %& 1 %P 1 - 15 %Z sequence number: 83 %I ACM %C New York, NY %@ false %B Proceedings of ACM SIGGRAPH 2021 %O ACM SIGGRAPH 2021
Tewari, A., Fried, O., Thies, J., et al. 2021. Advances in Neural Rendering. SIGGRAPH 2021 Courses (ACM SIGGRAPH 2021), ACM.
Export
BibTeX
@inproceedings{Tewari_SIGGRAPH_Course2021, TITLE = {Advances in Neural Rendering}, AUTHOR = {Tewari, Ayush and Fried, O. and Thies, J. and Sitzmann, V. and Lombardi, S. and Xu, Z. and Simon, T. and Nie{\ss}ner, M. and Tretschk, E. and Liu, L. and Mildenhall, B. and Srinivasan, P. and Pandey, R. and Orts-Escolano, S. and Fanello, S. and Guo, M. and Wetzstein, G. and Zhu, J.-Y. and Theobalt, Christian and Agrawala, M. and Goldman, D. B and Zollh{\"o}fer, M.}, LANGUAGE = {eng}, ISBN = {9781450383615}, DOI = {10.1145/3450508.3464573}, PUBLISHER = {ACM}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {SIGGRAPH 2021 Courses (ACM SIGGRAPH 2021)}, PAGES = {1--320}, EID = {1}, ADDRESS = {Virtual Event, USA}, }
Endnote
%0 Conference Proceedings %A Tewari, Ayush %A Fried, O. %A Thies, J. %A Sitzmann, V. %A Lombardi, S. %A Xu, Z. %A Simon, T. %A Nießner, M. %A Tretschk, E. %A Liu, L. %A Mildenhall, B. %A Srinivasan, P. %A Pandey, R. %A Orts-Escolano, S. %A Fanello, S. %A Guo, M. %A Wetzstein, G. %A Zhu, J.-Y. %A Theobalt, Christian %A Agrawala, M. %A Goldman, D. B %A Zollhöfer, M. %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Advances in Neural Rendering : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4D23-A %R 10.1145/3450508.3464573 %D 2021 %B ACM SIGGRAPH 2021 %Z date of event: 2021-08-09 - 2021-08-13 %C Virtual Event, USA %B SIGGRAPH 2021 Courses %P 1 - 320 %Z sequence number: 1 %I ACM %@ 9781450383615
Wang, J., Liu, L., Xu, W., Sarkar, K., and Theobalt, C. Estimating Egocentric 3D Human Pose in Global Space. International Conference on Computer Vision (ICCV 2021), IEEE.
(arXiv: 2104.13454, Accepted/in press)
Abstract
Egocentric 3D human pose estimation using a single fisheye camera has become popular recently as it allows capturing a wide range of daily activities in unconstrained environments, which is difficult for traditional outside-in motion capture with external cameras. However, existing methods have several limitations. A prominent problem is that the estimated poses lie in the local coordinate system of the fisheye camera, rather than in the world coordinate system, which is restrictive for many applications. Furthermore, these methods suffer from limited accuracy and temporal instability due to ambiguities caused by the monocular setup and the severe occlusion in a strongly distorted egocentric perspective. To tackle these limitations, we present a new method for egocentric global 3D body pose estimation using a single head-mounted fisheye camera. To achieve accurate and temporally stable global poses, a spatio-temporal optimization is performed over a sequence of frames by minimizing heatmap reprojection errors and enforcing local and global body motion priors learned from a mocap dataset. Experimental results show that our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
Export
BibTeX
@inproceedings{Wang_ICCV2021, TITLE = {Estimating Egocentric {3D} Human Pose in Global Space}, AUTHOR = {Wang, Jian and Liu, Lingjie and Xu, Weipeng and Sarkar, Kripasindhu and Theobalt, Christian}, LANGUAGE = {eng}, EPRINT = {2104.13454}, EPRINTTYPE = {arXiv}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Egocentric 3D human pose estimation using a single fisheye camera has become popular recently as it allows capturing a wide range of daily activities in unconstrained environments, which is difficult for traditional outside-in motion capture with external cameras. However, existing methods have several limitations. A prominent problem is that the estimated poses lie in the local coordinate system of the fisheye camera, rather than in the world coordinate system, which is restrictive for many applications. Furthermore, these methods suffer from limited accuracy and temporal instability due to ambiguities caused by the monocular setup and the severe occlusion in a strongly distorted egocentric perspective. To tackle these limitations, we present a new method for egocentric global 3D body pose estimation using a single head-mounted fisheye camera. To achieve accurate and temporally stable global poses, a spatio-temporal optimization is performed over a sequence of frames by minimizing heatmap reprojection errors and enforcing local and global body motion priors learned from a mocap dataset. Experimental results show that our approach outperforms state-of-the-art methods both quantitatively and qualitatively.}, BOOKTITLE = {International Conference on Computer Vision (ICCV 2021)}, ADDRESS = {Virtual}, }
Endnote
%0 Conference Proceedings %A Wang, Jian %A Liu, Lingjie %A Xu, Weipeng %A Sarkar, Kripasindhu %A Theobalt, Christian %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Estimating Egocentric 3D Human Pose in Global Space : %G eng %U http://hdl.handle.net/21.11116/0000-0009-532B-A %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual %X Egocentric 3D human pose estimation using a single fisheye camera has become popular recently as it allows capturing a wide range of daily activities in unconstrained environments, which is difficult for traditional outside-in motion capture with external cameras. However, existing methods have several limitations. A prominent problem is that the estimated poses lie in the local coordinate system of the fisheye camera, rather than in the world coordinate system, which is restrictive for many applications. Furthermore, these methods suffer from limited accuracy and temporal instability due to ambiguities caused by the monocular setup and the severe occlusion in a strongly distorted egocentric perspective. To tackle these limitations, we present a new method for egocentric global 3D body pose estimation using a single head-mounted fisheye camera. To achieve accurate and temporally stable global poses, a spatio-temporal optimization is performed over a sequence of frames by minimizing heatmap reprojection errors and enforcing local and global body motion priors learned from a mocap dataset. Experimental results show that our approach outperforms state-of-the-art methods both quantitatively and qualitatively. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV %B International Conference on Computer Vision %I IEEE
Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., and Wang, W. 2021. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. https://arxiv.org/abs/2106.10689.
(arXiv: 2106.10689)
Abstract
We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision, easily get trapped in local minima, and therefore struggle with the reconstruction of objects with severe self-occlusion or thin structures. Meanwhile, recent neural methods for novel view synthesis, such as NeRF and its variants, use volume rendering to produce a neural scene representation with robustness of optimization, even for highly complex objects. However, extracting high-quality surfaces from this learned implicit representation is difficult because there are not sufficient surface constraints in the representation. In NeuS, we propose to represent a surface as the zero-level set of a signed distance function (SDF) and develop a new volume rendering method to train a neural SDF representation. We observe that the conventional volume rendering method causes inherent geometric errors (i.e. bias) for surface reconstruction, and therefore propose a new formulation that is free of bias in the first order of approximation, thus leading to more accurate surface reconstruction even without the mask supervision. Experiments on the DTU dataset and the BlendedMVS dataset show that NeuS outperforms the state-of-the-arts in high-quality surface reconstruction, especially for objects and scenes with complex structures and self-occlusion.
Export
BibTeX
@online{Wang2106.10689, TITLE = {{NeuS}: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction}, AUTHOR = {Wang, Peng and Liu, Lingjie and Liu, Yuan and Theobalt, Christian and Komura, Taku and Wang, Wenping}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2106.10689}, EPRINT = {2106.10689}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision, easily get trapped in local minima, and therefore struggle with the reconstruction of objects with severe self-occlusion or thin structures. Meanwhile, recent neural methods for novel view synthesis, such as NeRF and its variants, use volume rendering to produce a neural scene representation with robustness of optimization, even for highly complex objects. However, extracting high-quality surfaces from this learned implicit representation is difficult because there are not sufficient surface constraints in the representation. In NeuS, we propose to represent a surface as the zero-level set of a signed distance function (SDF) and develop a new volume rendering method to train a neural SDF representation. We observe that the conventional volume rendering method causes inherent geometric errors (i.e. bias) for surface reconstruction, and therefore propose a new formulation that is free of bias in the first order of approximation, thus leading to more accurate surface reconstruction even without the mask supervision. Experiments on the DTU dataset and the BlendedMVS dataset show that NeuS outperforms the state-of-the-arts in high-quality surface reconstruction, especially for objects and scenes with complex structures and self-occlusion.}, }
Endnote
%0 Report %A Wang, Peng %A Liu, Lingjie %A Liu, Yuan %A Theobalt, Christian %A Komura, Taku %A Wang, Wenping %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations %T NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction : %G eng %U http://hdl.handle.net/21.11116/0000-0009-5276-6 %U https://arxiv.org/abs/2106.10689 %D 2021 %X We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision, easily get trapped in local minima, and therefore struggle with the reconstruction of objects with severe self-occlusion or thin structures. Meanwhile, recent neural methods for novel view synthesis, such as NeRF and its variants, use volume rendering to produce a neural scene representation with robustness of optimization, even for highly complex objects. However, extracting high-quality surfaces from this learned implicit representation is difficult because there are not sufficient surface constraints in the representation. In NeuS, we propose to represent a surface as the zero-level set of a signed distance function (SDF) and develop a new volume rendering method to train a neural SDF representation. We observe that the conventional volume rendering method causes inherent geometric errors (i.e. bias) for surface reconstruction, and therefore propose a new formulation that is free of bias in the first order of approximation, thus leading to more accurate surface reconstruction even without the mask supervision. Experiments on the DTU dataset and the BlendedMVS dataset show that NeuS outperforms the state-of-the-arts in high-quality surface reconstruction, especially for objects and scenes with complex structures and self-occlusion. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Graphics, cs.GR
Yenamandra, T., Tewari, A., Bernard, F., et al. i3DMM: Deep Implicit 3D Morphable Model of Human Heads. IEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral) (CVPR 2021), IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{Yenamandra_CVPR2021, TITLE = {{i3DMM}: {D}eep Implicit {3D} Morphable Model of Human Heads}, AUTHOR = {Yenamandra, Tarun and Tewari, Ayush and Bernard, Florian and Seidel, Hans-Peter and Elgharib, Mohamed and Cremers, Daniel and Theobalt, Christian}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral) (CVPR 2021)}, PAGES = {12803--12813}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Yenamandra, Tarun %A Tewari, Ayush %A Bernard, Florian %A Seidel, Hans-Peter %A Elgharib, Mohamed %A Cremers, Daniel %A Theobalt, Christian %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Computer Graphics, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T i3DMM: Deep Implicit 3D Morphable Model of Human Heads : %G eng %U http://hdl.handle.net/21.11116/0000-0008-8966-B %D 2021 %B 34th IEEE Conference on Computer Vision and Pattern Recognition %Z date of event: 2021-06-19 - 2021-06-25 %C Virtual Conference %B IEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral) %P 12803 - 12813 %I IEEE %U https://gvv.mpi-inf.mpg.de/projects/i3DMM/
Yoon, J.S., Liu, L., Golyanik, V., Sarkar, K., Park, H.S., and Theobalt, C. Pose-Guided Human Animation from a Single Image in the Wild. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{Yoon_CVPR2021, TITLE = {Pose-Guided Human Animation from a Single Image in the Wild}, AUTHOR = {Yoon, Jae Shin and Liu, Lingjie and Golyanik, Vladislav and Sarkar, Kripasindhu and Park, Hyon Soo and Theobalt, Christian}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021)}, PAGES = {15039--15048}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Yoon, Jae Shin %A Liu, Lingjie %A Golyanik, Vladislav %A Sarkar, Kripasindhu %A Park, Hyon Soo %A Theobalt, Christian %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society %T Pose-Guided Human Animation from a Single Image in the Wild : %G eng %U http://hdl.handle.net/21.11116/0000-0008-8953-0 %D 2021 %B 34th IEEE Conference on Computer Vision and Pattern Recognition %Z date of event: 2021-06-19 - 2021-06-25 %C Virtual Conference %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %P 15039 - 15048 %I IEEE
Zhou, Y., Habermann, M., Habibie, I., Tewari, A., Theobalt, C., and Xu, F. Monocular Real-time Full Body Capture with Inter-part Correlations Computer Vision and Pattern Recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{Zhou_CVPR2021b, TITLE = {Monocular Real-time Full Body Capture with Inter-part Correlations Computer Vision and Pattern Recognition}, AUTHOR = {Zhou, Yuxiao and Habermann, Marc and Habibie, Ikhsanul and Tewari, Ayush and Theobalt, Christian and Xu, Feng}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021)}, PAGES = {4811--4822}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Zhou, Yuxiao %A Habermann, Marc %A Habibie, Ikhsanul %A Tewari, Ayush %A Theobalt, Christian %A Xu, Feng %+ External Organizations Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations %T Monocular Real-time Full Body Capture with Inter-part Correlations Computer Vision and Pattern Recognition : %G eng %U http://hdl.handle.net/21.11116/0000-0008-892F-A %D 2021 %B 34th IEEE Conference on Computer Vision and Pattern Recognition %Z date of event: 2021-06-19 - 2021-06-25 %C Virtual Conference %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %P 4811 - 4822 %I IEEE %U https://people.mpi-inf.mpg.de/~mhaberma/projects/2021-cvpr-full-body-capture/