
Zhi Li (PhD student)

Personal Information
Homepage | Github | Google Scholar | LinkedIn
About Me
I am currently a PhD student in Department of Computer Vision and Machine Learning at Max Planck Institute for Informatics, advised by Prof. Dr. Bernt Schiele. My research interests include computer vision for autonomous driving especially image perception under domain shift, and 3D computer vision.
Education/ Research Experience
- Jun. 2022 - Present: PhD student in Computer Vision and Machine Learning, Max Planck Institute for Informatics, Germany (Advisor: Prof. Dr. Bernt Schiele)
- Dec. 2021 - May 2022: Research intern in Computer Vision and Machine Learning, Max Planck Institute for Informatics, Germany (Advisors: Prof. Dr. Bernt Schiele and Dr. Dengxin Dai)
- Dec. 2020 - Nov. 2021: Research intern in Computer Vision and Machine Learning, Max Planck Institute for Informatics, Germany (Advisors: Prof. Dr. Bernt Schiele and Prof. Dr. Christian Theobalt)
- Sep. 2017 - Jun. 2020: MSc in Software Engineering, Xi'an Jiaotong University, China
- Sep. 2013 - Jun. 2017: BA in English Literature, Xi'an Jiaotong University, China
Publications
2025
- “MT-Occ: Single-View 3D Occupancy Prediction via Multi-task Distillation,” in Pattern Recognition (DAGM GCPR 2025), Freiburg, Germany, 2026.
2023
- “Test-time Domain Adaptation for Monocular Depth Estimation,” in IEEE International Conference on Robotics and Automation (ICRA 2023), London, UK, 2023.
2022
- “HULC: 3D HUman Motion Capture with Pose Manifold SampLing and Dense Contact Guidance,” in Computer Vision -- ECCV 2022, Tel Aviv, Israel, 2022.
- “MoCapDeform: Monocular 3D Human Motion Capture in Deformable Scenes,” in International Conference on 3D Vision, Hybrid / Prague, Czechia, 2022.more
Abstract
3D human motion capture from monocular RGB images respecting interactions of
a subject with complex and possibly deformable environments is a very
challenging, ill-posed and under-explored problem. Existing methods address it
only weakly and do not model possible surface deformations often occurring when
humans interact with scene surfaces. In contrast, this paper proposes
MoCapDeform, i.e., a new framework for monocular 3D human motion capture that
is the first to explicitly model non-rigid deformations of a 3D scene for
improved 3D human pose estimation and deformable environment reconstruction.
MoCapDeform accepts a monocular RGB video and a 3D scene mesh aligned in the
camera space. It first localises a subject in the input monocular video along
with dense contact labels using a new raycasting based strategy. Next, our
human-environment interaction constraints are leveraged to jointly optimise
global 3D human poses and non-rigid surface deformations. MoCapDeform achieves
superior accuracy than competing methods on several datasets, including our
newly recorded one with deforming background scenes.
2021
- “Monocular 3D Multi-Person Pose Estimation via Predicting Factorized Correction Factors,” Computer Vision and Image Understanding, vol. 213, 2021.