D2
Computer Vision and Machine Learning

Keyang Zhou (PhD Student)

MSc Keyang Zhou

Address
Max-Planck-Institut für Informatik
Saarland Informatics Campus
Campus
Location
-
Phone
+49 681 9325 2000
Fax
+49 681 9325 2099

Personal Information

Publications

Zhou, K., Lal Bhatnagar, B., Lenssen, J. E., & Pons-Moll, G. (2022). TOCH: Spatio-Temporal Object Correspondence to Hand for Motion Refinement. Retrieved from https://arxiv.org/abs/2205.07982
(arXiv: 2205.07982)
Abstract
We present TOCH, a method for refining incorrect 3D hand-object interaction sequences using a data prior. Existing hand trackers, especially those that rely on very few cameras, often produce visually unrealistic results with hand-object intersection or missing contacts. Although correcting such errors requires reasoning about temporal aspects of interaction, most previous work focus on static grasps and contacts. The core of our method are TOCH fields, a novel spatio-temporal representation for modeling correspondences between hands and objects during interaction. The key component is a point-wise object-centric representation which encodes the hand position relative to the object. Leveraging this novel representation, we learn a latent manifold of plausible TOCH fields with a temporal denoising auto-encoder. Experiments demonstrate that TOCH outperforms state-of-the-art (SOTA) 3D hand-object interaction models, which are limited to static grasps and contacts. More importantly, our method produces smooth interactions even before and after contact. Using a single trained TOCH model, we quantitatively and qualitatively demonstrate its usefulness for 1) correcting erroneous reconstruction results from off-the-shelf RGB/RGB-D hand-object reconstruction methods, 2) de-noising, and 3) grasp transfer across objects. We will release our code and trained model on our project page at http://virtualhumans.mpi-inf.mpg.de/toch/
Export
BibTeX
@online{Zhou_2205.07982, TITLE = {{TOCH}: Spatio-Temporal Object Correspondence to Hand for Motion Refinement}, AUTHOR = {Zhou, Keyang and Lal Bhatnagar, Bharat and Lenssen, Jan Eric and Pons-Moll, Gerard}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2205.07982}, EPRINT = {2205.07982}, EPRINTTYPE = {arXiv}, YEAR = {2022}, ABSTRACT = {We present TOCH, a method for refining incorrect 3D hand-object interaction sequences using a data prior. Existing hand trackers, especially those that rely on very few cameras, often produce visually unrealistic results with hand-object intersection or missing contacts. Although correcting such errors requires reasoning about temporal aspects of interaction, most previous work focus on static grasps and contacts. The core of our method are TOCH fields, a novel spatio-temporal representation for modeling correspondences between hands and objects during interaction. The key component is a point-wise object-centric representation which encodes the hand position relative to the object. Leveraging this novel representation, we learn a latent manifold of plausible TOCH fields with a temporal denoising auto-encoder. Experiments demonstrate that TOCH outperforms state-of-the-art (SOTA) 3D hand-object interaction models, which are limited to static grasps and contacts. More importantly, our method produces smooth interactions even before and after contact. Using a single trained TOCH model, we quantitatively and qualitatively demonstrate its usefulness for 1) correcting erroneous reconstruction results from off-the-shelf RGB/RGB-D hand-object reconstruction methods, 2) de-noising, and 3) grasp transfer across objects. We will release our code and trained model on our project page at http://virtualhumans.mpi-inf.mpg.de/toch/}, }
Endnote
%0 Report %A Zhou, Keyang %A Lal Bhatnagar, Bharat %A Lenssen, Jan Eric %A Pons-Moll, Gerard %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society %T TOCH: Spatio-Temporal Object Correspondence to Hand for Motion Refinement : %G eng %U http://hdl.handle.net/21.11116/0000-000A-ACF3-2 %U https://arxiv.org/abs/2205.07982 %D 2022 %X We present TOCH, a method for refining incorrect 3D hand-object interaction sequences using a data prior. Existing hand trackers, especially those that rely on very few cameras, often produce visually unrealistic results with hand-object intersection or missing contacts. Although correcting such errors requires reasoning about temporal aspects of interaction, most previous work focus on static grasps and contacts. The core of our method are TOCH fields, a novel spatio-temporal representation for modeling correspondences between hands and objects during interaction. The key component is a point-wise object-centric representation which encodes the hand position relative to the object. Leveraging this novel representation, we learn a latent manifold of plausible TOCH fields with a temporal denoising auto-encoder. Experiments demonstrate that TOCH outperforms state-of-the-art (SOTA) 3D hand-object interaction models, which are limited to static grasps and contacts. More importantly, our method produces smooth interactions even before and after contact. Using a single trained TOCH model, we quantitatively and qualitatively demonstrate its usefulness for 1) correcting erroneous reconstruction results from off-the-shelf RGB/RGB-D hand-object reconstruction methods, 2) de-noising, and 3) grasp transfer across objects. We will release our code and trained model on our project page at http://virtualhumans.mpi-inf.mpg.de/toch/ %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Zhou, K., Bhatnagar, B. L., Lenssen, J. E., & Pons-Moll, G. (n.d.). TOCH: Spatio-Temporal Object Correspondence to Hand for Motion Refinement. In European Conference on Computer Vision (ECCV 2022). Tel Aviv, Israel: Springer.
(Accepted/in press)
Abstract
We present TOCH, a method for refining incorrect 3D hand-object interaction sequences using a data prior. Existing hand trackers, especially those that rely on very few cameras, often produce visually unrealistic results with hand-object intersection or missing contacts. Although correcting such errors requires reasoning about temporal aspects of interaction, most previous work focus on static grasps and contacts. The core of our method are TOCH fields, a novel spatio-temporal representation for modeling correspondences between hands and objects during interaction. The key component is a point-wise object-centric representation which encodes the hand position relative to the object. Leveraging this novel representation, we learn a latent manifold of plausible TOCH fields with a temporal denoising auto-encoder. Experiments demonstrate that TOCH outperforms state-of-the-art (SOTA) 3D hand-object interaction models, which are limited to static grasps and contacts. More importantly, our method produces smooth interactions even before and after contact. Using a single trained TOCH model, we quantitatively and qualitatively demonstrate its usefulness for 1) correcting erroneous reconstruction results from off-the-shelf RGB/RGB-D hand-object reconstruction methods, 2) de-noising, and 3) grasp transfer across objects. We will release our code and trained model on our project page at http://virtualhumans.mpi-inf.mpg.de/toch/
Export
BibTeX
@inproceedings{Zhou_ECCV2022, TITLE = {{TOCH}: {S}patio-Temporal Object Correspondence to Hand for Motion Refinement}, AUTHOR = {Zhou, Keyang and Bhatnagar, Bharat Lal and Lenssen, Jan Eric and Pons-Moll, Gerard}, LANGUAGE = {eng}, PUBLISHER = {Springer}, YEAR = {2022}, PUBLREMARK = {Accepted}, ABSTRACT = {We present TOCH, a method for refining incorrect 3D hand-object interaction sequences using a data prior. Existing hand trackers, especially those that rely on very few cameras, often produce visually unrealistic results with hand-object intersection or missing contacts. Although correcting such errors requires reasoning about temporal aspects of interaction, most previous work focus on static grasps and contacts. The core of our method are TOCH fields, a novel spatio-temporal representation for modeling correspondences between hands and objects during interaction. The key component is a point-wise object-centric representation which encodes the hand position relative to the object. Leveraging this novel representation, we learn a latent manifold of plausible TOCH fields with a temporal denoising auto-encoder. Experiments demonstrate that TOCH outperforms state-of-the-art (SOTA) 3D hand-object interaction models, which are limited to static grasps and contacts. More importantly, our method produces smooth interactions even before and after contact. Using a single trained TOCH model, we quantitatively and qualitatively demonstrate its usefulness for 1) correcting erroneous reconstruction results from off-the-shelf RGB/RGB-D hand-object reconstruction methods, 2) de-noising, and 3) grasp transfer across objects. We will release our code and trained model on our project page at http://virtualhumans.mpi-inf.mpg.de/toch/}, BOOKTITLE = {European Conference on Computer Vision (ECCV 2022)}, SERIES = {Lecture Notes in Computer Science}, ADDRESS = {Tel Aviv, Israel}, }
Endnote
%0 Conference Proceedings %A Zhou, Keyang %A Bhatnagar, Bharat Lal %A Lenssen, Jan Eric %A Pons-Moll, Gerard %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society %T TOCH: Spatio-Temporal Object Correspondence to Hand for Motion Refinement : %G eng %U http://hdl.handle.net/21.11116/0000-000A-B586-2 %D 2022 %B European Conference on Computer Vision %Z date of event: 2022-10-23 - 2022-10-27 %C Tel Aviv, Israel %X We present TOCH, a method for refining incorrect 3D hand-object interaction sequences using a data prior. Existing hand trackers, especially those that rely on very few cameras, often produce visually unrealistic results with hand-object intersection or missing contacts. Although correcting such errors requires reasoning about temporal aspects of interaction, most previous work focus on static grasps and contacts. The core of our method are TOCH fields, a novel spatio-temporal representation for modeling correspondences between hands and objects during interaction. The key component is a point-wise object-centric representation which encodes the hand position relative to the object. Leveraging this novel representation, we learn a latent manifold of plausible TOCH fields with a temporal denoising auto-encoder. Experiments demonstrate that TOCH outperforms state-of-the-art (SOTA) 3D hand-object interaction models, which are limited to static grasps and contacts. More importantly, our method produces smooth interactions even before and after contact. Using a single trained TOCH model, we quantitatively and qualitatively demonstrate its usefulness for 1) correcting erroneous reconstruction results from off-the-shelf RGB/RGB-D hand-object reconstruction methods, 2) de-noising, and 3) grasp transfer across objects. We will release our code and trained model on our project page at http://virtualhumans.mpi-inf.mpg.de/toch/ %K Computer Science, Computer Vision and Pattern Recognition, cs.CV %B European Conference on Computer Vision %I Springer %B Lecture Notes in Computer Science
Zhou, K. (2020). Unsupervised Shape and Pose Disentanglement for 3D Meshes. Universität des Saarlandes, Saarbrücken.
Export
BibTeX
@mastersthesis{ZhoMaster2020, TITLE = {Unsupervised Shape and Pose Disentanglement for {3D} Meshes}, AUTHOR = {Zhou, Keyang}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, }
Endnote
%0 Thesis %A Zhou, Keyang %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society %T Unsupervised Shape and Pose Disentanglement for 3D Meshes : %G eng %U http://hdl.handle.net/21.11116/0000-0007-B432-5 %I Universität des Saarlandes %C Saarbrücken %D 2020 %P 59 p. %V master %9 master
Zhou, K., Bhatnagar, B. L., Schiele, B., & Pons-Moll, G. (2021). Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes. Retrieved from https://arxiv.org/abs/2102.01161
(arXiv: 2102.01161)
Abstract
Most learning methods for 3D data (point clouds, meshes) suffer significant performance drops when the data is not carefully aligned to a canonical orientation. Aligning real world 3D data collected from different sources is non-trivial and requires manual intervention. In this paper, we propose the Adjoint Rigid Transform (ART) Network, a neural module which can be integrated with a variety of 3D networks to significantly boost their performance. ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks such as shape reconstruction, interpolation, non-rigid registration, and latent disentanglement. ART achieves this with self-supervision and a rotation equivariance constraint on predicted rotations. The remarkable result is that with only self-supervision, ART facilitates learning a unique canonical orientation for both rigid and nonrigid shapes, which leads to a notable boost in performance of aforementioned tasks. We will release our code and pre-trained models for further research.
Export
BibTeX
@online{Zhou2102.01161, TITLE = {Adjoint Rigid Transform Network: {T}ask-conditioned Alignment of {3D} Shapes}, AUTHOR = {Zhou, Keyang and Bhatnagar, Bharat Lal and Schiele, Bernt and Pons-Moll, Gerard}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2102.01161}, EPRINT = {2102.01161}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Most learning methods for 3D data (point clouds, meshes) suffer significant performance drops when the data is not carefully aligned to a canonical orientation. Aligning real world 3D data collected from different sources is non-trivial and requires manual intervention. In this paper, we propose the Adjoint Rigid Transform (ART) Network, a neural module which can be integrated with a variety of 3D networks to significantly boost their performance. ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks such as shape reconstruction, interpolation, non-rigid registration, and latent disentanglement. ART achieves this with self-supervision and a rotation equivariance constraint on predicted rotations. The remarkable result is that with only self-supervision, ART facilitates learning a unique canonical orientation for both rigid and nonrigid shapes, which leads to a notable boost in performance of aforementioned tasks. We will release our code and pre-trained models for further research.}, }
Endnote
%0 Report %A Zhou, Keyang %A Bhatnagar, Bharat Lal %A Schiele, Bernt %A Pons-Moll, Gerard %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society %T Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes : %G eng %U http://hdl.handle.net/21.11116/0000-0009-80FA-C %U https://arxiv.org/abs/2102.01161 %D 2021 %X Most learning methods for 3D data (point clouds, meshes) suffer significant performance drops when the data is not carefully aligned to a canonical orientation. Aligning real world 3D data collected from different sources is non-trivial and requires manual intervention. In this paper, we propose the Adjoint Rigid Transform (ART) Network, a neural module which can be integrated with a variety of 3D networks to significantly boost their performance. ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks such as shape reconstruction, interpolation, non-rigid registration, and latent disentanglement. ART achieves this with self-supervision and a rotation equivariance constraint on predicted rotations. The remarkable result is that with only self-supervision, ART facilitates learning a unique canonical orientation for both rigid and nonrigid shapes, which leads to a notable boost in performance of aforementioned tasks. We will release our code and pre-trained models for further research. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV