Mattia Segu (PhD Student)

MSc Mattia Segu

Address: Max-Planck-Institut für Informatik
Saarland Informatics Campus
Campus E1 4
66123 Saarbrücken
Location: E1 4 - 622
Phone: +49 681 9325 0
Fax: +49 681 9325 2099
E-mail: Get email via email

Personal Information

Publications

2025

Paper

L. Piccinelli, C. Sakaridis, M. Segu, Y.-H. Yang, S. Li, W. Abbeloos, and L. Van Gool

“UniK3D: Universal Camera Monocular 3D Estimation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.16591.

Abstract

Monocular 3D estimation is crucial for visual perception. However, current
methods fall short by relying on oversimplified assumptions, such as pinhole
camera models or rectified images. These limitations severely restrict their
general applicability, causing poor performance in real-world scenarios with
fisheye or panoramic images and resulting in substantial context loss. To
address this, we present UniK3D, the first generalizable method for monocular
3D estimation able to model any camera. Our method introduces a spherical 3D
representation which allows for better disentanglement of camera and scene
geometry and enables accurate metric 3D reconstruction for unconstrained camera
models. Our camera component features a novel, model-independent representation
of the pencil of rays, achieved through a learned superposition of spherical
harmonics. We also introduce an angular loss, which, together with the camera
module design, prevents the contraction of the 3D outputs for wide-view
cameras. A comprehensive zero-shot evaluation on 13 diverse datasets
demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and
camera metrics, with substantial gains in challenging large-field-of-view and
panoramic settings, while maintaining top accuracy in conventional pinhole
small-field-of-view domains. Code and models are available at
github.com/lpiccinelli-eth/unik3d .

BibTeX

@online{Piccinelli2503.16591,
TITLE = {Uni{K}3{D}: {U}niversal Camera Monocular 3{D} Estimation},
AUTHOR = {Piccinelli, Luigi and Sakaridis, Christos and Segu, Mattia and Yang, Yung-Hsu and Li, Siyuan and Abbeloos, Wim and Van Gool, Luc},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2503.16591},
EPRINT = {2503.16591},
EPRINTTYPE = {arXiv},
YEAR = {2025},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Monocular 3D estimation is crucial for visual perception. However, current<br>methods fall short by relying on oversimplified assumptions, such as pinhole<br>camera models or rectified images. These limitations severely restrict their<br>general applicability, causing poor performance in real-world scenarios with<br>fisheye or panoramic images and resulting in substantial context loss. To<br>address this, we present UniK3D, the first generalizable method for monocular<br>3D estimation able to model any camera. Our method introduces a spherical 3D<br>representation which allows for better disentanglement of camera and scene<br>geometry and enables accurate metric 3D reconstruction for unconstrained camera<br>models. Our camera component features a novel, model-independent representation<br>of the pencil of rays, achieved through a learned superposition of spherical<br>harmonics. We also introduce an angular loss, which, together with the camera<br>module design, prevents the contraction of the 3D outputs for wide-view<br>cameras. A comprehensive zero-shot evaluation on 13 diverse datasets<br>demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and<br>camera metrics, with substantial gains in challenging large-field-of-view and<br>panoramic settings, while maintaining top accuracy in conventional pinhole<br>small-field-of-view domains. Code and models are available at<br>github.com/lpiccinelli-eth/unik3d .<br>},
}

Endnote

%0 Report
%A Piccinelli, Luigi
%A Sakaridis, Christos
%A Segu, Mattia
%A Yang, Yung-Hsu
%A Li, Siyuan
%A Abbeloos, Wim
%A Van Gool, Luc
%+ External Organizations
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T UniK3D: Universal Camera Monocular 3D Estimation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0011-1CB7-0
%U https://arxiv.org/abs/2503.16591
%D 2025
%X   Monocular 3D estimation is crucial for visual perception. However, current<br>methods fall short by relying on oversimplified assumptions, such as pinhole<br>camera models or rectified images. These limitations severely restrict their<br>general applicability, causing poor performance in real-world scenarios with<br>fisheye or panoramic images and resulting in substantial context loss. To<br>address this, we present UniK3D, the first generalizable method for monocular<br>3D estimation able to model any camera. Our method introduces a spherical 3D<br>representation which allows for better disentanglement of camera and scene<br>geometry and enables accurate metric 3D reconstruction for unconstrained camera<br>models. Our camera component features a novel, model-independent representation<br>of the pencil of rays, achieved through a learned superposition of spherical<br>harmonics. We also introduce an angular loss, which, together with the camera<br>module design, prevents the contraction of the 3D outputs for wide-view<br>cameras. A comprehensive zero-shot evaluation on 13 diverse datasets<br>demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and<br>camera metrics, with substantial gains in challenging large-field-of-view and<br>panoramic settings, while maintaining top accuracy in conventional pinhole<br>small-field-of-view domains. Code and models are available at<br>github.com/lpiccinelli-eth/unik3d .<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV

Paper

L. Piccinelli, C. Sakaridis, Y.-H. Yang, M. Segu, S. Li, W. Abbeloos, and L. Van Gool

“UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler,” 2025. [Online]. Available: https://arxiv.org/abs/2502.20110.

Abstract

Accurate monocular metric depth estimation (MMDE) is crucial to solving
downstream tasks in 3D perception and modeling. However, the remarkable
accuracy of recent MMDE methods is confined to their training domains. These
methods fail to generalize to unseen domains even in the presence of moderate
domain gaps, which hinders their practical applicability. We propose a new
model, UniDepthV2, capable of reconstructing metric 3D scenes from solely
single images across domains. Departing from the existing MMDE paradigm,
UniDepthV2 directly predicts metric 3D points from the input image at inference
time without any additional information, striving for a universal and flexible
MMDE solution. In particular, UniDepthV2 implements a self-promptable camera
module predicting a dense camera representation to condition depth features.
Our model exploits a pseudo-spherical output representation, which disentangles
the camera and depth representations. In addition, we propose a geometric
invariance loss that promotes the invariance of camera-prompted depth features.
UniDepthV2 improves its predecessor UniDepth model via a new edge-guided loss
which enhances the localization and sharpness of edges in the metric depth
outputs, a revisited, simplified and more efficient architectural design, and
an additional uncertainty-level output which enables downstream tasks requiring
confidence. Thorough evaluations on ten depth datasets in a zero-shot regime
consistently demonstrate the superior performance and generalization of
UniDepthV2. Code and models are available at
github.com/lpiccinelli-eth/UniDepth

BibTeX

@online{Piccinelli2502.20110,
TITLE = {{UniDepthV2}: Universal Monocular Metric Depth Estimation Made Simpler},
AUTHOR = {Piccinelli, Luigi and Sakaridis, Christos and Yang, Yung-Hsu and Segu, Mattia and Li, Siyuan and Abbeloos, Wim and Van Gool, Luc},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2502.20110},
EPRINT = {2502.20110},
EPRINTTYPE = {arXiv},
YEAR = {2025},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Accurate monocular metric depth estimation (MMDE) is crucial to solving<br>downstream tasks in 3D perception and modeling. However, the remarkable<br>accuracy of recent MMDE methods is confined to their training domains. These<br>methods fail to generalize to unseen domains even in the presence of moderate<br>domain gaps, which hinders their practical applicability. We propose a new<br>model, UniDepthV2, capable of reconstructing metric 3D scenes from solely<br>single images across domains. Departing from the existing MMDE paradigm,<br>UniDepthV2 directly predicts metric 3D points from the input image at inference<br>time without any additional information, striving for a universal and flexible<br>MMDE solution. In particular, UniDepthV2 implements a self-promptable camera<br>module predicting a dense camera representation to condition depth features.<br>Our model exploits a pseudo-spherical output representation, which disentangles<br>the camera and depth representations. In addition, we propose a geometric<br>invariance loss that promotes the invariance of camera-prompted depth features.<br>UniDepthV2 improves its predecessor UniDepth model via a new edge-guided loss<br>which enhances the localization and sharpness of edges in the metric depth<br>outputs, a revisited, simplified and more efficient architectural design, and<br>an additional uncertainty-level output which enables downstream tasks requiring<br>confidence. Thorough evaluations on ten depth datasets in a zero-shot regime<br>consistently demonstrate the superior performance and generalization of<br>UniDepthV2. Code and models are available at<br>https://github.com/lpiccinelli-eth/UniDepth<br>},
}

Endnote

%0 Report
%A Piccinelli, Luigi
%A Sakaridis, Christos
%A Yang, Yung-Hsu
%A Segu, Mattia
%A Li, Siyuan
%A Abbeloos, Wim
%A Van Gool, Luc
%+ External Organizations
External Organizations
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-FAEA-D
%U https://arxiv.org/abs/2502.20110
%D 2025
%X   Accurate monocular metric depth estimation (MMDE) is crucial to solving<br>downstream tasks in 3D perception and modeling. However, the remarkable<br>accuracy of recent MMDE methods is confined to their training domains. These<br>methods fail to generalize to unseen domains even in the presence of moderate<br>domain gaps, which hinders their practical applicability. We propose a new<br>model, UniDepthV2, capable of reconstructing metric 3D scenes from solely<br>single images across domains. Departing from the existing MMDE paradigm,<br>UniDepthV2 directly predicts metric 3D points from the input image at inference<br>time without any additional information, striving for a universal and flexible<br>MMDE solution. In particular, UniDepthV2 implements a self-promptable camera<br>module predicting a dense camera representation to condition depth features.<br>Our model exploits a pseudo-spherical output representation, which disentangles<br>the camera and depth representations. In addition, we propose a geometric<br>invariance loss that promotes the invariance of camera-prompted depth features.<br>UniDepthV2 improves its predecessor UniDepth model via a new edge-guided loss<br>which enhances the localization and sharpness of edges in the metric depth<br>outputs, a revisited, simplified and more efficient architectural design, and<br>an additional uncertainty-level output which enables downstream tasks requiring<br>confidence. Thorough evaluations on ten depth datasets in a zero-shot regime<br>consistently demonstrate the superior performance and generalization of<br>UniDepthV2. Code and models are available at<br>https://github.com/lpiccinelli-eth/UniDepth<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV

Paper

R. Qorbani, G. Villani, T. Panagiotakopoulos, M. B. Colomer, L. Härenstam-Nielsen, M. Segu, P. L. Dovesi, J. Karlgren, D. Cremers, F. Tombari, and M. Poggi

“Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.21780.

Abstract

Open-vocabulary semantic segmentation models associate vision and text to
label pixels from an undefined set of classes using textual queries, providing
versatile performance on novel datasets. However, large shifts between training
and test domains degrade their performance, requiring fine-tuning for effective
real-world applications. We introduce Semantic Library Adaptation (SemLA), a
novel framework for training-free, test-time domain adaptation. SemLA leverages
a library of LoRA-based adapters indexed with CLIP embeddings, dynamically
merging the most relevant adapters based on proximity to the target domain in
the embedding space. This approach constructs an ad-hoc model tailored to each
specific input without additional training. Our method scales efficiently,
enhances explainability by tracking adapter contributions, and inherently
protects data privacy, making it ideal for sensitive applications.
Comprehensive experiments on a 20-domain benchmark built over 10 standard
datasets demonstrate SemLA's superior adaptability and performance across
diverse settings, establishing a new standard in domain adaptation for
open-vocabulary semantic segmentation.

BibTeX

@online{Qorbani2503.21780,
TITLE = {Semantic Library Adaptation: {L}o{RA} Retrieval and Fusion for Open-Vocabulary Semantic Segmentation},
AUTHOR = {Qorbani, Reza and Villani, Gianluca and Panagiotakopoulos, Theodoros and Colomer, Marc Botet and H{\"a}renstam-Nielsen, Linus and Segu, Mattia and Dovesi, Pier Luigi and Karlgren, Jussi and Cremers, Daniel and Tombari, Federico and Poggi, Matteo},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2503.21780},
EPRINT = {2503.21780},
EPRINTTYPE = {arXiv},
YEAR = {2025},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Open-vocabulary semantic segmentation models associate vision and text to<br>label pixels from an undefined set of classes using textual queries, providing<br>versatile performance on novel datasets. However, large shifts between training<br>and test domains degrade their performance, requiring fine-tuning for effective<br>real-world applications. We introduce Semantic Library Adaptation (SemLA), a<br>novel framework for training-free, test-time domain adaptation. SemLA leverages<br>a library of LoRA-based adapters indexed with CLIP embeddings, dynamically<br>merging the most relevant adapters based on proximity to the target domain in<br>the embedding space. This approach constructs an ad-hoc model tailored to each<br>specific input without additional training. Our method scales efficiently,<br>enhances explainability by tracking adapter contributions, and inherently<br>protects data privacy, making it ideal for sensitive applications.<br>Comprehensive experiments on a 20-domain benchmark built over 10 standard<br>datasets demonstrate SemLA's superior adaptability and performance across<br>diverse settings, establishing a new standard in domain adaptation for<br>open-vocabulary semantic segmentation.<br>},
}

Endnote

%0 Report
%A Qorbani, Reza
%A Villani, Gianluca
%A Panagiotakopoulos, Theodoros
%A Colomer, Marc Botet
%A H&#228;renstam-Nielsen, Linus
%A Segu, Mattia
%A Dovesi, Pier Luigi
%A Karlgren, Jussi
%A Cremers, Daniel
%A Tombari, Federico
%A Poggi, Matteo
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0011-1CD1-2
%U https://arxiv.org/abs/2503.21780
%D 2025
%X   Open-vocabulary semantic segmentation models associate vision and text to<br>label pixels from an undefined set of classes using textual queries, providing<br>versatile performance on novel datasets. However, large shifts between training<br>and test domains degrade their performance, requiring fine-tuning for effective<br>real-world applications. We introduce Semantic Library Adaptation (SemLA), a<br>novel framework for training-free, test-time domain adaptation. SemLA leverages<br>a library of LoRA-based adapters indexed with CLIP embeddings, dynamically<br>merging the most relevant adapters based on proximity to the target domain in<br>the embedding space. This approach constructs an ad-hoc model tailored to each<br>specific input without additional training. Our method scales efficiently,<br>enhances explainability by tracking adapter contributions, and inherently<br>protects data privacy, making it ideal for sensitive applications.<br>Comprehensive experiments on a 20-domain benchmark built over 10 standard<br>datasets demonstrate SemLA's superior adaptability and performance across<br>diverse settings, establishing a new standard in domain adaptation for<br>open-vocabulary semantic segmentation.<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV

2024

Conference paper

M. Segu, L. Piccinelli, S. Li, L. V. Gool, F. Yu, and B. Schiele

“Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs,” in Computer Vision -- ECCV 2024, Milan, Italy, 2024.

@inproceedings{Segu_ECCV24,
TITLE = {Walker: {S}elf-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs},
AUTHOR = {Segu, Mattia and Piccinelli, Luigi and Li, Siyuan and Gool, Luc Van and Yu, Fisher and Schiele, B.},
LANGUAGE = {eng},
ISBN = {978-3-031-73241-6},
DOI = {10.1007/978-3-031-73242-3_1},
PUBLISHER = {Springer},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Computer Vision -- ECCV 2024},
EDITOR = {Leonardis, Ale{\v s} and Ricci, Elisa and Roth, Stefan and Russakovsky, Olga and Sattler, Torsten and Varel, G{\"u}l},
PAGES = {1--18},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {15066},
ADDRESS = {Milan, Italy},
}

Endnote

%0 Conference Proceedings
%A Segu, Mattia
%A Piccinelli, Luigi
%A Li, Siyuan
%A Gool, Luc Van
%A Yu, Fisher
%A Schiele, B.
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-0FCA-B
%R 10.1007/978-3-031-73242-3_1
%D 2024
%B 18th European Conference on Computer Vision
%Z date of event: 2024-09-29 - 2024-10-04
%C Milan, Italy
%B Computer Vision -- ECCV 2024
%E Leonardis, Ale&#353;; Ricci, Elisa; Roth, Stefan; Russakovsky, Olga; Sattler, Torsten; Varel, G&#252;l
%P 1 - 18
%I Springer
%@ 978-3-031-73241-6
%B Lecture Notes in Computer Science
%N 15066
%U https://rdcu.be/dZVCw

Paper

M. Segu, L. Piccinelli, S. Li, Y.-H. Yang, B. Schiele, and L. Van Gool

“Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking,” 2024. [Online]. Available: https://arxiv.org/abs/2410.01806.

Abstract

Multiple object tracking in complex scenarios - such as coordinated dance
performances, team sports, or dynamic animal groups - presents unique
challenges. In these settings, objects frequently move in coordinated patterns,
occlude each other, and exhibit long-term dependencies in their trajectories.
However, it remains a key open research question on how to model long-range
dependencies within tracklets, interdependencies among tracklets, and the
associated temporal occlusions. To this end, we introduce Samba, a novel
linear-time set-of-sequences model designed to jointly process multiple
tracklets by synchronizing the multiple selective state-spaces used to model
each tracklet. Samba autoregressively predicts the future track query for each
sequence while maintaining synchronized long-term memory representations across
tracklets. By integrating Samba into a tracking-by-propagation framework, we
propose SambaMOTR, the first tracker effectively addressing the aforementioned
issues, including long-range dependencies, tracklet interdependencies, and
temporal occlusions. Additionally, we introduce an effective technique for
dealing with uncertain observations (MaskObs) and an efficient training recipe
to scale SambaMOTR to longer sequences. By modeling long-range dependencies and
interactions among tracked objects, SambaMOTR implicitly learns to track
objects accurately through occlusions without any hand-crafted heuristics. Our
approach significantly surpasses prior state-of-the-art on the DanceTrack, BFT,
and SportsMOT datasets.

BibTeX

@online{Segu_2410.01806,
TITLE = {Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking},
AUTHOR = {Segu, Mattia and Piccinelli, Luigi and Li, Siyuan and Yang, Yung-Hsu and Schiele, Bernt and Van Gool, Luc},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2410.01806},
EPRINT = {2410.01806},
EPRINTTYPE = {arXiv},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Multiple object tracking in complex scenarios -- such as coordinated dance<br>performances, team sports, or dynamic animal groups -- presents unique<br>challenges. In these settings, objects frequently move in coordinated patterns,<br>occlude each other, and exhibit long-term dependencies in their trajectories.<br>However, it remains a key open research question on how to model long-range<br>dependencies within tracklets, interdependencies among tracklets, and the<br>associated temporal occlusions. To this end, we introduce Samba, a novel<br>linear-time set-of-sequences model designed to jointly process multiple<br>tracklets by synchronizing the multiple selective state-spaces used to model<br>each tracklet. Samba autoregressively predicts the future track query for each<br>sequence while maintaining synchronized long-term memory representations across<br>tracklets. By integrating Samba into a tracking-by-propagation framework, we<br>propose SambaMOTR, the first tracker effectively addressing the aforementioned<br>issues, including long-range dependencies, tracklet interdependencies, and<br>temporal occlusions. Additionally, we introduce an effective technique for<br>dealing with uncertain observations (MaskObs) and an efficient training recipe<br>to scale SambaMOTR to longer sequences. By modeling long-range dependencies and<br>interactions among tracked objects, SambaMOTR implicitly learns to track<br>objects accurately through occlusions without any hand-crafted heuristics. Our<br>approach significantly surpasses prior state-of-the-art on the DanceTrack, BFT,<br>and SportsMOT datasets.<br>},
}

Endnote

%0 Report
%A Segu, Mattia
%A Piccinelli, Luigi
%A Li, Siyuan
%A Yang, Yung-Hsu
%A Schiele, Bernt
%A Van Gool, Luc
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
%T Samba: Synchronized Set-of-Sequences Modeling for Multiple Object
  Tracking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-0FC7-E
%U https://arxiv.org/abs/2410.01806
%D 2024
%X   Multiple object tracking in complex scenarios - such as coordinated dance<br>performances, team sports, or dynamic animal groups - presents unique<br>challenges. In these settings, objects frequently move in coordinated patterns,<br>occlude each other, and exhibit long-term dependencies in their trajectories.<br>However, it remains a key open research question on how to model long-range<br>dependencies within tracklets, interdependencies among tracklets, and the<br>associated temporal occlusions. To this end, we introduce Samba, a novel<br>linear-time set-of-sequences model designed to jointly process multiple<br>tracklets by synchronizing the multiple selective state-spaces used to model<br>each tracklet. Samba autoregressively predicts the future track query for each<br>sequence while maintaining synchronized long-term memory representations across<br>tracklets. By integrating Samba into a tracking-by-propagation framework, we<br>propose SambaMOTR, the first tracker effectively addressing the aforementioned<br>issues, including long-range dependencies, tracklet interdependencies, and<br>temporal occlusions. Additionally, we introduce an effective technique for<br>dealing with uncertain observations (MaskObs) and an efficient training recipe<br>to scale SambaMOTR to longer sequences. By modeling long-range dependencies and<br>interactions among tracked objects, SambaMOTR implicitly learns to track<br>objects accurately through occlusions without any hand-crafted heuristics. Our<br>approach significantly surpasses prior state-of-the-art on the DanceTrack, BFT,<br>and SportsMOT datasets.<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Artificial Intelligence, cs.AI

2023

Conference paper

Q. Fan, M. Segu, Y.-W. Tai, F. Yu, C.-K. Tang, B. Schiele, and D. Dai

“Towards Robust Object Detection Invariant to Real-World Domain Shifts,” in Eleventh International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 2023.

@inproceedings{Fan_ICLR23,
TITLE = {Towards Robust Object Detection Invariant to Real-World Domain Shifts},
AUTHOR = {Fan, Qi and Segu, Mattia and Tai, Yu-Wing and Yu, Fisher and Tang, Chi-Keung and Schiele, Bernt and Dai, Dengxin},
LANGUAGE = {eng},
URL = {https://openreview.net/group?id=ICLR.cc/2023/Conference#poster},
PUBLISHER = {OpenReview.net},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {Eleventh International Conference on Learning Representations (ICLR 2023)},
ADDRESS = {Kigali, Rwanda},
}

Endnote

%0 Conference Proceedings
%A Fan, Qi
%A Segu, Mattia
%A Tai, Yu-Wing
%A Yu, Fisher
%A Tang, Chi-Keung
%A Schiele, Bernt
%A Dai, Dengxin
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T Towards Robust Object Detection Invariant to Real-World Domain Shifts : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-45CA-D
%D 2023
%B Eleventh International Conference on Learning Representations
%Z date of event: 2023-05-01 - 2023-05-05
%C Kigali, Rwanda
%B Eleventh International Conference on Learning Representations
%I OpenReview.net
%U https://openreview.net/group?id=ICLR.cc/2023/Conference#poster
%U https://openreview.net/forum?id=vqSyt8D3ny

Conference paper

M. Segu, B. Schiele, and F. Yu

“DARTH: Holistic Test-time Adaptation for Multiple Object Tracking,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.

@inproceedings{Segu_ICCV23,
TITLE = {{DARTH}: {H}olistic Test-time Adaptation for Multiple Object Tracking},
AUTHOR = {Segu, Mattia and Schiele, Bernt and Yu, Fisher},
LANGUAGE = {eng},
ISBN = {979-8-3503-0718-4},
DOI = {10.1109/ICCV51070.2023.00891},
PUBLISHER = {IEEE},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
BOOKTITLE = {IEEE/CVF International Conference on Computer Vision (ICCV 2023)},
PAGES = {9683--9693},
ADDRESS = {Paris, France},
}

Endnote

%0 Conference Proceedings
%A Segu, Mattia
%A Schiele, Bernt
%A Yu, Fisher
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
%T DARTH: Holistic Test-time Adaptation for Multiple Object Tracking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-D1AB-6
%R 10.1109/ICCV51070.2023.00891
%D 2023
%B IEEE/CVF International Conference on Computer Vision
%Z date of event: 2023-10-02 - 2023-10-06
%C Paris, France
%B IEEE/CVF International Conference on Computer Vision 
%P 9683 - 9693
%I IEEE
%@ 979-8-3503-0718-4