Vladimir Guzov (PhD Student)
MSc Vladimir Guzov
- Address
- Max-Planck-Institut für Informatik
Saarland Informatics Campus
Campus - Standort
- -
- Telefon
- +49 681 9325 2000
- Fax
- +49 681 9325 2099
- Get email via email
Personal Information
Publications
Lazova, V., Guzov, V., Olszewski, K., Tulyakov, S., & Pons-Moll, G. (2022). Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation. Retrieved from https://arxiv.org/abs/2204.10850
(arXiv: 2204.10850) Abstract
We present a novel method for performing flexible, 3D-aware image content<br>manipulation while enabling high-quality novel view synthesis. While NeRF-based<br>approaches are effective for novel view synthesis, such models memorize the<br>radiance for every point in a scene within a neural network. Since these models<br>are scene-specific and lack a 3D scene representation, classical editing such<br>as shape manipulation, or combining scenes is not possible. Hence, editing and<br>combining NeRF-based scenes has not been demonstrated. With the aim of<br>obtaining interpretable and controllable scene representations, our model<br>couples learnt scene-specific feature volumes with a scene agnostic neural<br>rendering network. With this hybrid representation, we decouple neural<br>rendering from scene-specific geometry and appearance. We can generalize to<br>novel scenes by optimizing only the scene-specific 3D feature representation,<br>while keeping the parameters of the rendering network fixed. The rendering<br>function learnt during the initial training stage can thus be easily applied to<br>new scenes, making our approach more flexible. More importantly, since the<br>feature volumes are independent of the rendering model, we can manipulate and<br>combine scenes by editing their corresponding feature volumes. The edited<br>volume can then be plugged into the rendering model to synthesize high-quality<br>novel views. We demonstrate various scene manipulations, including mixing<br>scenes, deforming objects and inserting objects into scenes, while still<br>producing photo-realistic results.<br>
Export
BibTeX
@online{Lazova2204.10850,
TITLE = {Control-{NeRF}: Editable Feature Volumes for Scene Rendering and Manipulation},
AUTHOR = {Lazova, Verica and Guzov, Vladimir and Olszewski, Kyle and Tulyakov, Sergey and Pons-Moll, Gerard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2204.10850},
EPRINT = {2204.10850},
EPRINTTYPE = {arXiv},
YEAR = {2022},
MARGINALMARK = {$\bullet$},
ABSTRACT = {We present a novel method for performing flexible, 3D-aware image content<br>manipulation while enabling high-quality novel view synthesis. While NeRF-based<br>approaches are effective for novel view synthesis, such models memorize the<br>radiance for every point in a scene within a neural network. Since these models<br>are scene-specific and lack a 3D scene representation, classical editing such<br>as shape manipulation, or combining scenes is not possible. Hence, editing and<br>combining NeRF-based scenes has not been demonstrated. With the aim of<br>obtaining interpretable and controllable scene representations, our model<br>couples learnt scene-specific feature volumes with a scene agnostic neural<br>rendering network. With this hybrid representation, we decouple neural<br>rendering from scene-specific geometry and appearance. We can generalize to<br>novel scenes by optimizing only the scene-specific 3D feature representation,<br>while keeping the parameters of the rendering network fixed. The rendering<br>function learnt during the initial training stage can thus be easily applied to<br>new scenes, making our approach more flexible. More importantly, since the<br>feature volumes are independent of the rendering model, we can manipulate and<br>combine scenes by editing their corresponding feature volumes. The edited<br>volume can then be plugged into the rendering model to synthesize high-quality<br>novel views. We demonstrate various scene manipulations, including mixing<br>scenes, deforming objects and inserting objects into scenes, while still<br>producing photo-realistic results.<br>},
}
Endnote
%0 Report
%A Lazova, Verica
%A Guzov, Vladimir
%A Olszewski, Kyle
%A Tulyakov, Sergey
%A Pons-Moll, Gerard
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T Control-NeRF: Editable Feature Volumes for Scene Rendering and
Manipulation :
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-2A9F-3
%U https://arxiv.org/abs/2204.10850
%D 2022
%X We present a novel method for performing flexible, 3D-aware image content<br>manipulation while enabling high-quality novel view synthesis. While NeRF-based<br>approaches are effective for novel view synthesis, such models memorize the<br>radiance for every point in a scene within a neural network. Since these models<br>are scene-specific and lack a 3D scene representation, classical editing such<br>as shape manipulation, or combining scenes is not possible. Hence, editing and<br>combining NeRF-based scenes has not been demonstrated. With the aim of<br>obtaining interpretable and controllable scene representations, our model<br>couples learnt scene-specific feature volumes with a scene agnostic neural<br>rendering network. With this hybrid representation, we decouple neural<br>rendering from scene-specific geometry and appearance. We can generalize to<br>novel scenes by optimizing only the scene-specific 3D feature representation,<br>while keeping the parameters of the rendering network fixed. The rendering<br>function learnt during the initial training stage can thus be easily applied to<br>new scenes, making our approach more flexible. More importantly, since the<br>feature volumes are independent of the rendering model, we can manipulate and<br>combine scenes by editing their corresponding feature volumes. The edited<br>volume can then be plugged into the rendering model to synthesize high-quality<br>novel views. We demonstrate various scene manipulations, including mixing<br>scenes, deforming objects and inserting objects into scenes, while still<br>producing photo-realistic results.<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Zhang, X., Bhatnagar, B. L., Starke, S., Guzov, V., & Pons-Moll, G. (2022). COUCH: Towards Controllable Human-Chair Interactions. In Computer Vision -- ECCV 2022. Tel Aviv, Israel: Springer. doi:10.1007/978-3-031-20065-6_30
Export
BibTeX
@inproceedings{Zhang_ECCV2022,
TITLE = {{COUCH}: {T}owards Controllable Human-Chair Interactions},
AUTHOR = {Zhang, Xiaohan and Bhatnagar, Bharat Lal and Starke, Sebastian and Guzov, Vladimir and Pons-Moll, Gerard},
LANGUAGE = {eng},
ISBN = {978-3-031-20064-9},
DOI = {10.1007/978-3-031-20065-6_30},
PUBLISHER = {Springer},
YEAR = {2022},
MARGINALMARK = {$\bullet$},
DATE = {2022},
BOOKTITLE = {Computer Vision -- ECCV 2022},
EDITOR = {Avidan, Shai and Brostow, Gabriel and Ciss{\'e}, Moustapha and Farinella, Giovanni and Hassner, Tal},
PAGES = {518--535},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {13665},
ADDRESS = {Tel Aviv, Israel},
}
Endnote
%0 Conference Proceedings
%A Zhang, Xiaohan
%A Bhatnagar, Bharat Lal
%A Starke, Sebastian
%A Guzov, Vladimir
%A Pons-Moll, Gerard
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T COUCH: Towards Controllable Human-Chair Interactions :
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-2A92-0
%R 10.1007/978-3-031-20065-6_30
%D 2022
%B 17th European Conference on Computer Vision
%Z date of event: 2022-10-23 - 2022-10-27
%C Tel Aviv, Israel
%B Computer Vision -- ECCV 2022
%E Avidan, Shai; Brostow, Gabriel; Cissé, Moustapha; Farinella, Giovanni; Hassner, Tal
%P 518 - 535
%I Springer
%@ 978-3-031-20064-9
%B Lecture Notes in Computer Science
%N 13665
%U https://rdcu.be/c26Qf
Guzov, V., Sattler, T., & Pons-Moll, G. (2022). Visually Plausible Human-Object Interaction Capture from Wearable Sensors. Retrieved from https://arxiv.org/abs/2205.02830
(arXiv: 2205.02830) Abstract
In everyday lives, humans naturally modify the surrounding environment<br>through interactions, e.g., moving a chair to sit on it. To reproduce such<br>interactions in virtual spaces (e.g., metaverse), we need to be able to capture<br>and model them, including changes in the scene geometry, ideally from<br>ego-centric input alone (head camera and body-worn inertial sensors). This is<br>an extremely hard problem, especially since the object/scene might not be<br>visible from the head camera (e.g., a human not looking at a chair while<br>sitting down, or not looking at the door handle while opening a door). In this<br>paper, we present HOPS, the first method to capture interactions such as<br>dragging objects and opening doors from ego-centric data alone. Central to our<br>method is reasoning about human-object interactions, allowing to track objects<br>even when they are not visible from the head camera. HOPS localizes and<br>registers both the human and the dynamic object in a pre-scanned static scene.<br>HOPS is an important first step towards advanced AR/VR applications based on<br>immersive virtual universes, and can provide human-centric training data to<br>teach machines to interact with their surroundings. The supplementary video,<br>data, and code will be available on our project page at<br>http://virtualhumans.mpi-inf.mpg.de/hops/<br>
Export
BibTeX
@online{Guzov2205.02830,
TITLE = {Visually Plausible Human-Object Interaction Capture from Wearable Sensors},
AUTHOR = {Guzov, Vladimir and Sattler, Torsten and Pons-Moll, Gerard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2205.02830},
EPRINT = {2205.02830},
EPRINTTYPE = {arXiv},
YEAR = {2022},
MARGINALMARK = {$\bullet$},
ABSTRACT = {In everyday lives, humans naturally modify the surrounding environment<br>through interactions, e.g., moving a chair to sit on it. To reproduce such<br>interactions in virtual spaces (e.g., metaverse), we need to be able to capture<br>and model them, including changes in the scene geometry, ideally from<br>ego-centric input alone (head camera and body-worn inertial sensors). This is<br>an extremely hard problem, especially since the object/scene might not be<br>visible from the head camera (e.g., a human not looking at a chair while<br>sitting down, or not looking at the door handle while opening a door). In this<br>paper, we present HOPS, the first method to capture interactions such as<br>dragging objects and opening doors from ego-centric data alone. Central to our<br>method is reasoning about human-object interactions, allowing to track objects<br>even when they are not visible from the head camera. HOPS localizes and<br>registers both the human and the dynamic object in a pre-scanned static scene.<br>HOPS is an important first step towards advanced AR/VR applications based on<br>immersive virtual universes, and can provide human-centric training data to<br>teach machines to interact with their surroundings. The supplementary video,<br>data, and code will be available on our project page at<br>http://virtualhumans.mpi-inf.mpg.de/hops/<br>},
}
Endnote
%0 Report
%A Guzov, Vladimir
%A Sattler, Torsten
%A Pons-Moll, Gerard
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T Visually Plausible Human-Object Interaction Capture from Wearable
Sensors :
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-2A9C-6
%U https://arxiv.org/abs/2205.02830
%D 2022
%X In everyday lives, humans naturally modify the surrounding environment<br>through interactions, e.g., moving a chair to sit on it. To reproduce such<br>interactions in virtual spaces (e.g., metaverse), we need to be able to capture<br>and model them, including changes in the scene geometry, ideally from<br>ego-centric input alone (head camera and body-worn inertial sensors). This is<br>an extremely hard problem, especially since the object/scene might not be<br>visible from the head camera (e.g., a human not looking at a chair while<br>sitting down, or not looking at the door handle while opening a door). In this<br>paper, we present HOPS, the first method to capture interactions such as<br>dragging objects and opening doors from ego-centric data alone. Central to our<br>method is reasoning about human-object interactions, allowing to track objects<br>even when they are not visible from the head camera. HOPS localizes and<br>registers both the human and the dynamic object in a pre-scanned static scene.<br>HOPS is an important first step towards advanced AR/VR applications based on<br>immersive virtual universes, and can provide human-centric training data to<br>teach machines to interact with their surroundings. The supplementary video,<br>data, and code will be available on our project page at<br>http://virtualhumans.mpi-inf.mpg.de/hops/<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Guzov, V., Mir, A., Sattler,, T., & Pons-Moll, G. (2021). Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021). Nashville, TN, US (Virtual): IEEE. doi:10.1109/CVPR46437.2021.00430
Export
BibTeX
@inproceedings{Guzov_CVPR21,
TITLE = {Human {POSEitioning} System ({HPS}): {3D} Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors},
AUTHOR = {Guzov, Vladimir and Mir, Aymen and Sattler,, Torsten and Pons-Moll, Gerard},
LANGUAGE = {eng},
ISBN = {978-1-6654-4509-2},
DOI = {10.1109/CVPR46437.2021.00430},
PUBLISHER = {IEEE},
YEAR = {2021},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021)},
PAGES = {4318--4329},
ADDRESS = {Nashville, TN, US (Virtual)},
}
Endnote
%0 Conference Proceedings
%A Guzov, Vladimir
%A Mir, Aymen
%A Sattler,, Torsten
%A Pons-Moll, Gerard
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T Human POSEitioning System (HPS): 3D Human Pose Estimation and
Self-localization in Large Scenes from Body-Mounted Sensors :
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-8AF2-A
%R 10.1109/CVPR46437.2021.00430
%D 2021
%B 34th IEEE Conference on Computer Vision and Pattern Recognition
%Z date of event: 2021-06-19 - 2021-06-25
%C Nashville, TN, US (Virtual)
%B IEEE/CVF Conference on Computer Vision and Pattern Recognition
%P 4318 - 4329
%I IEEE
%@ 978-1-6654-4509-2