- “Visibility Aware Human-Object Interaction Tracking from Single RGB Camera,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada.
- “NSF: Neural Surface Fields for Human Modeling from Monocular Depth,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France.
- “Modelling 3D Humans : Pose, Shape, Clothing and Interactions,” Universität des Saarlandes, Saarbrücken, 2023.
- “CHORE: Contact, Human and Object Reconstruction from a Single RGB Image,” in Computer Vision -- ECCV 2022, Tel Aviv, Israel, 2022.
- “COUCH: Towards Controllable Human-Chair Interactions,” in Computer Vision -- ECCV 2022, Tel Aviv, Israel, 2022.
- “TOCH: Spatio-Temporal Object Correspondence to Hand for Motion Refinement,” in Computer Vision -- ECCV 2022, Tel Aviv, Israel, 2022.
- “BEHAVE: Dataset and Method for Tracking Human Object Interactions,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), New Orleans, LA, USA, 2022.
- “TOCH: Spatio-Temporal Object Correspondence to Hand for Motion Refinement,” 2022. [Online]. Available: https://arxiv.org/abs/2205.07982.more
We present TOCH, a method for refining incorrect 3D hand-object interaction
sequences using a data prior. Existing hand trackers, especially those that
rely on very few cameras, often produce visually unrealistic results with
hand-object intersection or missing contacts. Although correcting such errors
requires reasoning about temporal aspects of interaction, most previous work
focus on static grasps and contacts. The core of our method are TOCH fields, a
novel spatio-temporal representation for modeling correspondences between hands
and objects during interaction. The key component is a point-wise
object-centric representation which encodes the hand position relative to the
object. Leveraging this novel representation, we learn a latent manifold of
plausible TOCH fields with a temporal denoising auto-encoder. Experiments
demonstrate that TOCH outperforms state-of-the-art (SOTA) 3D hand-object
interaction models, which are limited to static grasps and contacts. More
importantly, our method produces smooth interactions even before and after
contact. Using a single trained TOCH model, we quantitatively and qualitatively
demonstrate its usefulness for 1) correcting erroneous reconstruction results
from off-the-shelf RGB/RGB-D hand-object reconstruction methods, 2) de-noising,
and 3) grasp transfer across objects. We will release our code and trained
model on our project page at virtualhumans.mpi-inf.mpg.de/toch/
- “Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes,” 2021. [Online]. Available: https://arxiv.org/abs/2102.01161.more
Most learning methods for 3D data (point clouds, meshes) suffer significant
performance drops when the data is not carefully aligned to a canonical
orientation. Aligning real world 3D data collected from different sources is
non-trivial and requires manual intervention. In this paper, we propose the
Adjoint Rigid Transform (ART) Network, a neural module which can be integrated
with a variety of 3D networks to significantly boost their performance. ART
learns to rotate input shapes to a learned canonical orientation, which is
crucial for a lot of tasks such as shape reconstruction, interpolation,
non-rigid registration, and latent disentanglement. ART achieves this with
self-supervision and a rotation equivariance constraint on predicted rotations.
The remarkable result is that with only self-supervision, ART facilitates
learning a unique canonical orientation for both rigid and nonrigid shapes,
which leads to a notable boost in performance of aforementioned tasks. We will
release our code and pre-trained models for further research.
- “LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration,” in Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Virtual Event, 2020.
- “Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction,” in Computer Vision -- ECCV 2020, Glasgow, UK, 2020.
- “SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing,” in Computer Vision -- ECCV 2020, Glasgow, UK, 2020.
- “Unsupervised Shape and Pose Disentanglement for 3D Meshes,” in Computer Vision -- ECCV 2020, Glasgow, UK, 2020.
- “Learning to Reconstruct People in Clothing from a Single RGB Camera,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, CA, USA, 2019.
- “Multi-Garment Net: Learning to Dress 3D People from Images,” in International Conference on Computer Vision (ICCV 2019), Seoul, Korea, 2019.