Keyang Zhou (PhD Student)

MSc Keyang Zhou
- Address
- Max-Planck-Institut für Informatik
Saarland Informatics Campus
Campus - Standort
- -
- Telefon
- +49 681 9325 2000
- Fax
- +49 681 9325 2099
- Get email via email
Personal Information
Publications
Zhou, K., Lal Bhatnagar, B., Lenssen, J. E., & Pons-Moll, G. (2022). TOCH: Spatio-Temporal Object Correspondence to Hand for Motion Refinement. Retrieved from https://arxiv.org/abs/2205.07982
(arXiv: 2205.07982) Abstract
We present TOCH, a method for refining incorrect 3D hand-object interaction<br>sequences using a data prior. Existing hand trackers, especially those that<br>rely on very few cameras, often produce visually unrealistic results with<br>hand-object intersection or missing contacts. Although correcting such errors<br>requires reasoning about temporal aspects of interaction, most previous work<br>focus on static grasps and contacts. The core of our method are TOCH fields, a<br>novel spatio-temporal representation for modeling correspondences between hands<br>and objects during interaction. The key component is a point-wise<br>object-centric representation which encodes the hand position relative to the<br>object. Leveraging this novel representation, we learn a latent manifold of<br>plausible TOCH fields with a temporal denoising auto-encoder. Experiments<br>demonstrate that TOCH outperforms state-of-the-art (SOTA) 3D hand-object<br>interaction models, which are limited to static grasps and contacts. More<br>importantly, our method produces smooth interactions even before and after<br>contact. Using a single trained TOCH model, we quantitatively and qualitatively<br>demonstrate its usefulness for 1) correcting erroneous reconstruction results<br>from off-the-shelf RGB/RGB-D hand-object reconstruction methods, 2) de-noising,<br>and 3) grasp transfer across objects. We will release our code and trained<br>model on our project page at http://virtualhumans.mpi-inf.mpg.de/toch/<br>
Export
BibTeX
@online{Zhou_2205.07982,
TITLE = {{TOCH}: Spatio-Temporal Object Correspondence to Hand for Motion Refinement},
AUTHOR = {Zhou, Keyang and Lal Bhatnagar, Bharat and Lenssen, Jan Eric and Pons-Moll, Gerard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2205.07982},
EPRINT = {2205.07982},
EPRINTTYPE = {arXiv},
YEAR = {2022},
MARGINALMARK = {$\bullet$},
ABSTRACT = {We present TOCH, a method for refining incorrect 3D hand-object interaction<br>sequences using a data prior. Existing hand trackers, especially those that<br>rely on very few cameras, often produce visually unrealistic results with<br>hand-object intersection or missing contacts. Although correcting such errors<br>requires reasoning about temporal aspects of interaction, most previous work<br>focus on static grasps and contacts. The core of our method are TOCH fields, a<br>novel spatio-temporal representation for modeling correspondences between hands<br>and objects during interaction. The key component is a point-wise<br>object-centric representation which encodes the hand position relative to the<br>object. Leveraging this novel representation, we learn a latent manifold of<br>plausible TOCH fields with a temporal denoising auto-encoder. Experiments<br>demonstrate that TOCH outperforms state-of-the-art (SOTA) 3D hand-object<br>interaction models, which are limited to static grasps and contacts. More<br>importantly, our method produces smooth interactions even before and after<br>contact. Using a single trained TOCH model, we quantitatively and qualitatively<br>demonstrate its usefulness for 1) correcting erroneous reconstruction results<br>from off-the-shelf RGB/RGB-D hand-object reconstruction methods, 2) de-noising,<br>and 3) grasp transfer across objects. We will release our code and trained<br>model on our project page at http://virtualhumans.mpi-inf.mpg.de/toch/<br>},
}
Endnote
%0 Report
%A Zhou, Keyang
%A Lal Bhatnagar, Bharat
%A Lenssen, Jan Eric
%A Pons-Moll, Gerard
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T TOCH: Spatio-Temporal Object Correspondence to Hand for Motion
Refinement :
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-ACF3-2
%U https://arxiv.org/abs/2205.07982
%D 2022
%X We present TOCH, a method for refining incorrect 3D hand-object interaction<br>sequences using a data prior. Existing hand trackers, especially those that<br>rely on very few cameras, often produce visually unrealistic results with<br>hand-object intersection or missing contacts. Although correcting such errors<br>requires reasoning about temporal aspects of interaction, most previous work<br>focus on static grasps and contacts. The core of our method are TOCH fields, a<br>novel spatio-temporal representation for modeling correspondences between hands<br>and objects during interaction. The key component is a point-wise<br>object-centric representation which encodes the hand position relative to the<br>object. Leveraging this novel representation, we learn a latent manifold of<br>plausible TOCH fields with a temporal denoising auto-encoder. Experiments<br>demonstrate that TOCH outperforms state-of-the-art (SOTA) 3D hand-object<br>interaction models, which are limited to static grasps and contacts. More<br>importantly, our method produces smooth interactions even before and after<br>contact. Using a single trained TOCH model, we quantitatively and qualitatively<br>demonstrate its usefulness for 1) correcting erroneous reconstruction results<br>from off-the-shelf RGB/RGB-D hand-object reconstruction methods, 2) de-noising,<br>and 3) grasp transfer across objects. We will release our code and trained<br>model on our project page at http://virtualhumans.mpi-inf.mpg.de/toch/<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Zhou, K., Bhatnagar, B. L., Lenssen, J. E., & Pons-Moll, G. (2022). TOCH: Spatio-Temporal Object Correspondence to Hand for Motion Refinement. In Computer Vision -- ECCV 2022. Tel Aviv, Israel: Springer. doi:10.1007/978-3-031-20062-5_1
Export
BibTeX
@inproceedings{Zhou_ECCV2022,
TITLE = {{TOCH}: {S}patio-Temporal Object Correspondence to Hand for Motion Refinement},
AUTHOR = {Zhou, Keyang and Bhatnagar, Bharat Lal and Lenssen, Jan Eric and Pons-Moll, Gerard},
LANGUAGE = {eng},
ISBN = {978-3-031-20061-8},
DOI = {10.1007/978-3-031-20062-5_1},
PUBLISHER = {Springer},
YEAR = {2022},
MARGINALMARK = {$\bullet$},
DATE = {2022},
BOOKTITLE = {Computer Vision -- ECCV 2022},
EDITOR = {Avidan, Shai and Brostow, Gabriel and Ciss{\'e}, Moustapha and Farinella, Giovanni and Hassner, Tal},
PAGES = {1--19},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {13663},
ADDRESS = {Tel Aviv, Israel},
}
Endnote
%0 Conference Proceedings
%A Zhou, Keyang
%A Bhatnagar, Bharat Lal
%A Lenssen, Jan Eric
%A Pons-Moll, Gerard
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T TOCH: Spatio-Temporal Object Correspondence to Hand for Motion
Refinement :
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-B586-2
%R 10.1007/978-3-031-20062-5_1
%D 2022
%B 17th European Conference on Computer Vision
%Z date of event: 2022-10-23 - 2022-10-27
%C Tel Aviv, Israel
%B Computer Vision -- ECCV 2022
%E Avidan, Shai; Brostow, Gabriel; Cissé, Moustapha; Farinella, Giovanni; Hassner, Tal
%P 1 - 19
%I Springer
%@ 978-3-031-20061-8
%B Lecture Notes in Computer Science
%N 13663
%U https://rdcu.be/c26JY
Zhou, K. (2020). Unsupervised Shape and Pose Disentanglement for 3D Meshes. Universität des Saarlandes, Saarbrücken.
Export
BibTeX
@mastersthesis{ZhoMaster2020,
TITLE = {Unsupervised Shape and Pose Disentanglement for {3D} Meshes},
AUTHOR = {Zhou, Keyang},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2020},
DATE = {2020},
}
Endnote
%0 Thesis
%A Zhou, Keyang
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T Unsupervised Shape and Pose Disentanglement
for 3D Meshes :
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-B432-5
%I Universität des Saarlandes
%C Saarbrücken
%D 2020
%P 59 p.
%V master
%9 master
Zhou, K., Bhatnagar, B. L., Schiele, B., & Pons-Moll, G. (2021). Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes. Retrieved from https://arxiv.org/abs/2102.01161
(arXiv: 2102.01161) Abstract
Most learning methods for 3D data (point clouds, meshes) suffer significant<br>performance drops when the data is not carefully aligned to a canonical<br>orientation. Aligning real world 3D data collected from different sources is<br>non-trivial and requires manual intervention. In this paper, we propose the<br>Adjoint Rigid Transform (ART) Network, a neural module which can be integrated<br>with a variety of 3D networks to significantly boost their performance. ART<br>learns to rotate input shapes to a learned canonical orientation, which is<br>crucial for a lot of tasks such as shape reconstruction, interpolation,<br>non-rigid registration, and latent disentanglement. ART achieves this with<br>self-supervision and a rotation equivariance constraint on predicted rotations.<br>The remarkable result is that with only self-supervision, ART facilitates<br>learning a unique canonical orientation for both rigid and nonrigid shapes,<br>which leads to a notable boost in performance of aforementioned tasks. We will<br>release our code and pre-trained models for further research.<br>
Export
BibTeX
@online{Zhou2102.01161,
TITLE = {Adjoint Rigid Transform Network: {T}ask-conditioned Alignment of {3D} Shapes},
AUTHOR = {Zhou, Keyang and Bhatnagar, Bharat Lal and Schiele, Bernt and Pons-Moll, Gerard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2102.01161},
EPRINT = {2102.01161},
EPRINTTYPE = {arXiv},
YEAR = {2021},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Most learning methods for 3D data (point clouds, meshes) suffer significant<br>performance drops when the data is not carefully aligned to a canonical<br>orientation. Aligning real world 3D data collected from different sources is<br>non-trivial and requires manual intervention. In this paper, we propose the<br>Adjoint Rigid Transform (ART) Network, a neural module which can be integrated<br>with a variety of 3D networks to significantly boost their performance. ART<br>learns to rotate input shapes to a learned canonical orientation, which is<br>crucial for a lot of tasks such as shape reconstruction, interpolation,<br>non-rigid registration, and latent disentanglement. ART achieves this with<br>self-supervision and a rotation equivariance constraint on predicted rotations.<br>The remarkable result is that with only self-supervision, ART facilitates<br>learning a unique canonical orientation for both rigid and nonrigid shapes,<br>which leads to a notable boost in performance of aforementioned tasks. We will<br>release our code and pre-trained models for further research.<br>},
}
Endnote
%0 Report
%A Zhou, Keyang
%A Bhatnagar, Bharat Lal
%A Schiele, Bernt
%A Pons-Moll, Gerard
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes :
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-80FA-C
%U https://arxiv.org/abs/2102.01161
%D 2021
%X Most learning methods for 3D data (point clouds, meshes) suffer significant<br>performance drops when the data is not carefully aligned to a canonical<br>orientation. Aligning real world 3D data collected from different sources is<br>non-trivial and requires manual intervention. In this paper, we propose the<br>Adjoint Rigid Transform (ART) Network, a neural module which can be integrated<br>with a variety of 3D networks to significantly boost their performance. ART<br>learns to rotate input shapes to a learned canonical orientation, which is<br>crucial for a lot of tasks such as shape reconstruction, interpolation,<br>non-rigid registration, and latent disentanglement. ART achieves this with<br>self-supervision and a rotation equivariance constraint on predicted rotations.<br>The remarkable result is that with only self-supervision, ART facilitates<br>learning a unique canonical orientation for both rigid and nonrigid shapes,<br>which leads to a notable boost in performance of aforementioned tasks. We will<br>release our code and pre-trained models for further research.<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV