Abstract
Monocular 3D estimation is crucial for visual perception. However, current
methods fall short by relying on oversimplified assumptions, such as pinhole
camera models or rectified images. These limitations severely restrict their
general applicability, causing poor performance in real-world scenarios with
fisheye or panoramic images and resulting in substantial context loss. To
address this, we present UniK3D, the first generalizable method for monocular
3D estimation able to model any camera. Our method introduces a spherical 3D
representation which allows for better disentanglement of camera and scene
geometry and enables accurate metric 3D reconstruction for unconstrained camera
models. Our camera component features a novel, model-independent representation
of the pencil of rays, achieved through a learned superposition of spherical
harmonics. We also introduce an angular loss, which, together with the camera
module design, prevents the contraction of the 3D outputs for wide-view
cameras. A comprehensive zero-shot evaluation on 13 diverse datasets
demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and
camera metrics, with substantial gains in challenging large-field-of-view and
panoramic settings, while maintaining top accuracy in conventional pinhole
small-field-of-view domains. Code and models are available at
github.com/lpiccinelli-eth/unik3d .
BibTeX
@online{Piccinelli2503.16591, TITLE = {Uni{K}3{D}: {U}niversal Camera Monocular 3{D} Estimation}, AUTHOR = {Piccinelli, Luigi and Sakaridis, Christos and Segu, Mattia and Yang, Yung-Hsu and Li, Siyuan and Abbeloos, Wim and Van Gool, Luc}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2503.16591}, EPRINT = {2503.16591}, EPRINTTYPE = {arXiv}, YEAR = {2025}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Monocular 3D estimation is crucial for visual perception. However, current<br>methods fall short by relying on oversimplified assumptions, such as pinhole<br>camera models or rectified images. These limitations severely restrict their<br>general applicability, causing poor performance in real-world scenarios with<br>fisheye or panoramic images and resulting in substantial context loss. To<br>address this, we present UniK3D, the first generalizable method for monocular<br>3D estimation able to model any camera. Our method introduces a spherical 3D<br>representation which allows for better disentanglement of camera and scene<br>geometry and enables accurate metric 3D reconstruction for unconstrained camera<br>models. Our camera component features a novel, model-independent representation<br>of the pencil of rays, achieved through a learned superposition of spherical<br>harmonics. We also introduce an angular loss, which, together with the camera<br>module design, prevents the contraction of the 3D outputs for wide-view<br>cameras. A comprehensive zero-shot evaluation on 13 diverse datasets<br>demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and<br>camera metrics, with substantial gains in challenging large-field-of-view and<br>panoramic settings, while maintaining top accuracy in conventional pinhole<br>small-field-of-view domains. Code and models are available at<br>github.com/lpiccinelli-eth/unik3d .<br>}, }
Endnote
%0 Report %A Piccinelli, Luigi %A Sakaridis, Christos %A Segu, Mattia %A Yang, Yung-Hsu %A Li, Siyuan %A Abbeloos, Wim %A Van Gool, Luc %+ External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations %T UniK3D: Universal Camera Monocular 3D Estimation : %G eng %U http://hdl.handle.net/21.11116/0000-0011-1CB7-0 %U https://arxiv.org/abs/2503.16591 %D 2025 %X Monocular 3D estimation is crucial for visual perception. However, current<br>methods fall short by relying on oversimplified assumptions, such as pinhole<br>camera models or rectified images. These limitations severely restrict their<br>general applicability, causing poor performance in real-world scenarios with<br>fisheye or panoramic images and resulting in substantial context loss. To<br>address this, we present UniK3D, the first generalizable method for monocular<br>3D estimation able to model any camera. Our method introduces a spherical 3D<br>representation which allows for better disentanglement of camera and scene<br>geometry and enables accurate metric 3D reconstruction for unconstrained camera<br>models. Our camera component features a novel, model-independent representation<br>of the pencil of rays, achieved through a learned superposition of spherical<br>harmonics. We also introduce an angular loss, which, together with the camera<br>module design, prevents the contraction of the 3D outputs for wide-view<br>cameras. A comprehensive zero-shot evaluation on 13 diverse datasets<br>demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and<br>camera metrics, with substantial gains in challenging large-field-of-view and<br>panoramic settings, while maintaining top accuracy in conventional pinhole<br>small-field-of-view domains. Code and models are available at<br>github.com/lpiccinelli-eth/unik3d .<br> %K Computer Science, Computer Vision and Pattern Recognition, cs.CV