D2
Computer Vision and Machine Learning

Dengxin Dai (Senior Researcher)

Dr. Dengxin Dai

Address
Max-Planck-Institut für Informatik
Saarland Informatics Campus
Campus E1 4
66123 Saarbrücken
Location
E1 4 - 604
Phone
+49 681 9325 2104
Fax
+49 681 9325 2099

Vision for Autonomous Systems (VAS) Group

My Group Website: VAS

Hiring

  • We are hiring PostDocs, PhD students, and Research Interns; we also offer projects for master's thesis. If you are interested, please contact me <ddai@mpi-inf.mpg.de> with your CV and transcripts. I also accept applicants with CSC scholorship.
  • We focus on deep learning-based perception for autonomous driving, especially on scaling existing visual perception models to novel domains,  to new data modality, to unseen classes and to more tasks 

Publications

Dai, D., Vasudevan, A. B., Matas, J., & Van Gool, L. (2022). Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds. IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2022.3155643
Export
BibTeX
@article{Dai2109.02763, TITLE = {Binaural {SoundNet}: {P}redicting Semantics, Depth and Motion with Binaural Sounds}, AUTHOR = {Dai, Dengxin and Vasudevan, Arun Balajee and Matas, Jiri and Van Gool, Luc}, LANGUAGE = {eng}, ISSN = {0162-8828}, DOI = {10.1109/TPAMI.2022.3155643}, PUBLISHER = {IEEE}, ADDRESS = {Piscataway, NJ}, YEAR = {2022}, JOURNAL = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, }
Endnote
%0 Journal Article %A Dai, Dengxin %A Vasudevan, Arun Balajee %A Matas, Jiri %A Van Gool, Luc %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds : %G eng %U http://hdl.handle.net/21.11116/0000-0009-444C-6 %R 10.1109/TPAMI.2022.3155643 %7 2022 %D 2022 %J IEEE Transactions on Pattern Analysis and Machine Intelligence %O IEEE Trans. Pattern Anal. Mach. Intell. %I IEEE %C Piscataway, NJ %@ false
Ding, J., Xue, N., Xia, G.-S., & Dai, D. (n.d.). Decoupling Zero-Shot Semantic Segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). New Orleans, LA, USA: IEEE.
(arXiv: 2112.07910, Accepted/in press)
Abstract
Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not been seen in the training. Existing works formulate ZS3 as a pixel-level zero-shot classification problem, and transfer semantic knowledge from seen classes to unseen ones with the help of language models pre-trained only with texts. While simple, the pixel-level ZS3 formulation shows the limited capability to integrate vision-language models that are often pre-trained with image-text pairs and currently demonstrate great potential for vision tasks. Inspired by the observation that humans often perform segment-level semantic labeling, we propose to decouple the ZS3 into two sub-tasks: 1) a class-agnostic grouping task to group the pixels into segments. 2) a zero-shot classification task on segments. The former sub-task does not involve category information and can be directly transferred to group pixels for unseen classes. The latter subtask performs at segment-level and provides a natural way to leverage large-scale vision-language models pre-trained with image-text pairs (e.g. CLIP) for ZS3. Based on the decoupling formulation, we propose a simple and effective zero-shot semantic segmentation model, called ZegFormer, which outperforms the previous methods on ZS3 standard benchmarks by large margins, e.g., 35 points on the PASCAL VOC and 3 points on the COCO-Stuff in terms of mIoU for unseen classes. Code will be released at https://github.com/dingjiansw101/ZegFormer.
Export
BibTeX
@inproceedings{Ding_CVPR2022, TITLE = {Decoupling Zero-Shot Semantic Segmentation}, AUTHOR = {Ding, Jian and Xue, Nan and Xia, Gui-Song and Dai, Dengxin}, LANGUAGE = {eng}, EPRINT = {2112.07910}, EPRINTTYPE = {arXiv}, PUBLISHER = {IEEE}, YEAR = {2022}, PUBLREMARK = {Accepted}, ABSTRACT = {Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not been seen in the training. Existing works formulate ZS3 as a pixel-level zero-shot classification problem, and transfer semantic knowledge from seen classes to unseen ones with the help of language models pre-trained only with texts. While simple, the pixel-level ZS3 formulation shows the limited capability to integrate vision-language models that are often pre-trained with image-text pairs and currently demonstrate great potential for vision tasks. Inspired by the observation that humans often perform segment-level semantic labeling, we propose to decouple the ZS3 into two sub-tasks: 1) a class-agnostic grouping task to group the pixels into segments. 2) a zero-shot classification task on segments. The former sub-task does not involve category information and can be directly transferred to group pixels for unseen classes. The latter subtask performs at segment-level and provides a natural way to leverage large-scale vision-language models pre-trained with image-text pairs (e.g. CLIP) for ZS3. Based on the decoupling formulation, we propose a simple and effective zero-shot semantic segmentation model, called ZegFormer, which outperforms the previous methods on ZS3 standard benchmarks by large margins, e.g., 35 points on the PASCAL VOC and 3 points on the COCO-Stuff in terms of mIoU for unseen classes. Code will be released at https://github.com/dingjiansw101/ZegFormer.}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Ding, Jian %A Xue, Nan %A Xia, Gui-Song %A Dai, Dengxin %+ External Organizations External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society %T Decoupling Zero-Shot Semantic Segmentation : %G eng %U http://hdl.handle.net/21.11116/0000-000A-16BD-9 %D 2022 %B 35th IEEE/CVF Conference on Computer Vision and Pattern Recognition %Z date of event: 2022-06-19 - 2022-06-24 %C New Orleans, LA, USA %X Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not been seen in the training. Existing works formulate ZS3 as a pixel-level zero-shot classification problem, and transfer semantic knowledge from seen classes to unseen ones with the help of language models pre-trained only with texts. While simple, the pixel-level ZS3 formulation shows the limited capability to integrate vision-language models that are often pre-trained with image-text pairs and currently demonstrate great potential for vision tasks. Inspired by the observation that humans often perform segment-level semantic labeling, we propose to decouple the ZS3 into two sub-tasks: 1) a class-agnostic grouping task to group the pixels into segments. 2) a zero-shot classification task on segments. The former sub-task does not involve category information and can be directly transferred to group pixels for unseen classes. The latter subtask performs at segment-level and provides a natural way to leverage large-scale vision-language models pre-trained with image-text pairs (e.g. CLIP) for ZS3. Based on the decoupling formulation, we propose a simple and effective zero-shot semantic segmentation model, called ZegFormer, which outperforms the previous methods on ZS3 standard benchmarks by large margins, e.g., 35 points on the PASCAL VOC and 3 points on the COCO-Stuff in terms of mIoU for unseen classes. Code will be released at https://github.com/dingjiansw101/ZegFormer. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %I IEEE
Zhang, Z., Liniger, A., Dai, D., Yu, F., & Van Gool, L. (2021). End-to-End Urban Driving by Imitating a Reinforcement Learning Coach. In IEEE/CVF International Conference on Computer Vision (ICCV 2021). Virtual Event: IEEE. doi:10.1109/ICCV48922.2021.01494
Export
BibTeX
@inproceedings{zhang2021roach, TITLE = {End-to-End Urban Driving by Imitating a Reinforcement Learning Coach}, AUTHOR = {Zhang, Zhejun and Liniger, Alexander and Dai, Dengxin and Yu, Fisher and Van Gool, Luc}, LANGUAGE = {eng}, ISBN = {978-1-6654-2812-5}, DOI = {10.1109/ICCV48922.2021.01494}, PUBLISHER = {IEEE}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF International Conference on Computer Vision (ICCV 2021)}, PAGES = {15202--15212}, ADDRESS = {Virtual Event}, }
Endnote
%0 Conference Proceedings %A Zhang, Zhejun %A Liniger, Alexander %A Dai, Dengxin %A Yu, Fisher %A Van Gool, Luc %+ External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations %T End-to-End Urban Driving by Imitating a Reinforcement Learning Coach : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4452-E %R 10.1109/ICCV48922.2021.01494 %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual Event %B IEEE/CVF International Conference on Computer Vision %P 15202 - 15212 %I IEEE %@ 978-1-6654-2812-5 %U https://github.com/zhejz/carla-roach
Gong, R., Danelljan, M., Dai, D., Wang, W., Paudel, D. P., Chhatkuli, A., … Van Gool, L. (2021). TADA: Taxonomy Adaptive Domain Adaptation. Retrieved from https://arxiv.org/abs/2109.04813
(arXiv: 2109.04813)
Abstract
Traditional domain adaptation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no domain change in the output space. In semantic prediction tasks, different datasets are often labeled according to different semantic taxonomies. In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain. We therefore introduce the more general taxonomy adaptive domain adaptation (TADA) problem, allowing for inconsistent taxonomies between the two domains. We further propose an approach that jointly addresses the image-level and label-level domain adaptation. On the label-level, we employ a bilateral mixed sampling strategy to augment the target domain, and a relabelling method to unify and align the label spaces. We address the image-level domain gap by proposing an uncertainty-rectified contrastive learning method, leading to more domain-invariant and class discriminative features. We extensively evaluate the effectiveness of our framework under different TADA settings: open taxonomy, coarse-to-fine taxonomy, and partially-overlapping taxonomy. Our framework outperforms previous state-of-the-art by a large margin, while capable of adapting to target taxonomies.
Export
BibTeX
@online{Gong2109.04813, TITLE = {{TADA}: {T}axonomy Adaptive Domain Adaptation}, AUTHOR = {Gong, Rui and Danelljan, Martin and Dai, Dengxin and Wang, Wenguan and Paudel, Danda Pani and Chhatkuli, Ajad and Yu, Fisher and Van Gool, Luc}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2109.04813}, EPRINT = {2109.04813}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Traditional domain adaptation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no domain change in the output space. In semantic prediction tasks, different datasets are often labeled according to different semantic taxonomies. In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain. We therefore introduce the more general taxonomy adaptive domain adaptation (TADA) problem, allowing for inconsistent taxonomies between the two domains. We further propose an approach that jointly addresses the image-level and label-level domain adaptation. On the label-level, we employ a bilateral mixed sampling strategy to augment the target domain, and a relabelling method to unify and align the label spaces. We address the image-level domain gap by proposing an uncertainty-rectified contrastive learning method, leading to more domain-invariant and class discriminative features. We extensively evaluate the effectiveness of our framework under different TADA settings: open taxonomy, coarse-to-fine taxonomy, and partially-overlapping taxonomy. Our framework outperforms previous state-of-the-art by a large margin, while capable of adapting to target taxonomies.}, }
Endnote
%0 Report %A Gong, Rui %A Danelljan, Martin %A Dai, Dengxin %A Wang, Wenguan %A Paudel, Danda Pani %A Chhatkuli, Ajad %A Yu, Fisher %A Van Gool, Luc %+ External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T TADA: Taxonomy Adaptive Domain Adaptation : %G eng %U http://hdl.handle.net/21.11116/0000-0009-89F0-D %U https://arxiv.org/abs/2109.04813 %D 2021 %X Traditional domain adaptation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no domain change in the output space. In semantic prediction tasks, different datasets are often labeled according to different semantic taxonomies. In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain. We therefore introduce the more general taxonomy adaptive domain adaptation (TADA) problem, allowing for inconsistent taxonomies between the two domains. We further propose an approach that jointly addresses the image-level and label-level domain adaptation. On the label-level, we employ a bilateral mixed sampling strategy to augment the target domain, and a relabelling method to unify and align the label spaces. We address the image-level domain gap by proposing an uncertainty-rectified contrastive learning method, leading to more domain-invariant and class discriminative features. We extensively evaluate the effectiveness of our framework under different TADA settings: open taxonomy, coarse-to-fine taxonomy, and partially-overlapping taxonomy. Our framework outperforms previous state-of-the-art by a large margin, while capable of adapting to target taxonomies. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Gong, R., Dai, D., Chen, Y., Li, W., & Van Gool, L. (2021). mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets. In IEEE/CVF International Conference on Computer Vision (ICCV 2021). Virtual Event: IEEE. doi:10.1109/ICCV48922.2021.00875
Export
BibTeX
@inproceedings{GongICCV21, TITLE = {{mDALU}: {M}ulti-Source Domain Adaptation and Label Unification with Partial Datasets}, AUTHOR = {Gong, Rui and Dai, Dengxin and Chen, Yuhua and Li, Wen and Van Gool, Luc}, LANGUAGE = {eng}, ISBN = {978-1-6654-2812-5}, DOI = {10.1109/ICCV48922.2021.00875}, PUBLISHER = {IEEE}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF International Conference on Computer Vision (ICCV 2021)}, PAGES = {8856--8865}, ADDRESS = {Virtual Event}, }
Endnote
%0 Conference Proceedings %A Gong, Rui %A Dai, Dengxin %A Chen, Yuhua %A Li, Wen %A Van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4476-6 %R 10.1109/ICCV48922.2021.00875 %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual Event %B IEEE/CVF International Conference on Computer Vision %P 8856 - 8865 %I IEEE %@ 978-1-6654-2812-5
Wang, Q., Dai, D., Hoyer, L., Van Gool, L., & Fink, O. (2021). Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation. In IEEE/CVF International Conference on Computer Vision (ICCV 2021). Virtual Event: IEEE. doi:10.1109/ICCV48922.2021.00840
Export
BibTeX
@inproceedings{wang2021domain, TITLE = {Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation}, AUTHOR = {Wang, Qin and Dai, Dengxin and Hoyer, Lukas and Van Gool, Luc and Fink, Olga}, LANGUAGE = {eng}, ISBN = {978-1-6654-2812-5}, DOI = {10.1109/ICCV48922.2021.00840}, PUBLISHER = {IEEE}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF International Conference on Computer Vision (ICCV 2021)}, PAGES = {8495--8505}, ADDRESS = {Virtual Event}, }
Endnote
%0 Conference Proceedings %A Wang, Qin %A Dai, Dengxin %A Hoyer, Lukas %A Van Gool, Luc %A Fink, Olga %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation : %G eng %U http://hdl.handle.net/21.11116/0000-0009-44AE-7 %R 10.1109/ICCV48922.2021.00840 %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual Event %B IEEE/CVF International Conference on Computer Vision %P 8495 - 8505 %I IEEE %@ 978-1-6654-2812-5 %U https://github.com/qinenergy/corda
Patil, V., Liniger, A., Dai, D., & Van Gool, L. (2022). Improving Depth Estimation Using Map-Based Depth Priors. IEEE Robotics and Automation Letters, 7(2). doi:10.1109/LRA.2022.3146914
Export
BibTeX
@article{Patil2022, TITLE = {Improving Depth Estimation Using Map-Based Depth Priors}, AUTHOR = {Patil, Vaishakh and Liniger, Alexander and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, ISSN = {2377-3766}, DOI = {10.1109/LRA.2022.3146914}, PUBLISHER = {IEEE}, ADDRESS = {New York, NY}, YEAR = {2022}, JOURNAL = {IEEE Robotics and Automation Letters}, VOLUME = {7}, NUMBER = {2}, PAGES = {3640--3647}, }
Endnote
%0 Journal Article %A Patil, Vaishakh %A Liniger, Alexander %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T Improving Depth Estimation Using Map-Based Depth Priors : %G eng %U http://hdl.handle.net/21.11116/0000-000A-1531-7 %R 10.1109/LRA.2022.3146914 %7 2022 %D 2022 %J IEEE Robotics and Automation Letters %V 7 %N 2 %& 3640 %P 3640 - 3647 %I IEEE %C New York, NY %@ false
Hahner, M., Sakaridis, C., Dai, D., & Van Gool, L. (2021). Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather. In IEEE/CVF International Conference on Computer Vision (ICCV 2021). Virtual Event: IEEE. doi:10.1109/ICCV48922.2021.01500
Export
BibTeX
@inproceedings{HahnerICCV21, TITLE = {Fog Simulation on Real {LiDAR} Point Clouds for {3D} Object Detection in Adverse Weather}, AUTHOR = {Hahner, Martin and Sakaridis, Christos and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, ISBN = {978-1-6654-2812-5}, DOI = {10.1109/ICCV48922.2021.01500}, PUBLISHER = {IEEE}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF International Conference on Computer Vision (ICCV 2021)}, PAGES = {15263--15272}, ADDRESS = {Virtual Event}, }
Endnote
%0 Conference Proceedings %A Hahner, Martin %A Sakaridis, Christos %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather : %G eng %U http://hdl.handle.net/21.11116/0000-0009-445F-1 %R 10.1109/ICCV48922.2021.01500 %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual Event %B IEEE/CVF International Conference on Computer Vision %P 15263 - 15272 %I IEEE %@ 978-1-6654-2812-5 %U https://github.com/MartinHahner/LiDAR_fog_sim
Li, S., Chen, X., Liu, Y., Dai, D., Stachniss, C., & Gall, J. (2022). Multi-Scale Interaction for Real-Time LiDAR Data Segmentation on an Embedded Platform. IEEE Robotics and Automation Letters, 7(2). doi:10.1109/LRA.2021.3132059
Export
BibTeX
@article{Li2022, TITLE = {Multi-Scale Interaction for Real-Time {LiDAR} Data Segmentation on an Embedded Platform}, AUTHOR = {Li, Shijie and Chen, Xieyuanli and Liu, Yun and Dai, Dengxin and Stachniss, Cyrill and Gall, J{\"u}rgen}, LANGUAGE = {eng}, ISSN = {2377-3766}, DOI = {10.1109/LRA.2021.3132059}, PUBLISHER = {IEEE}, ADDRESS = {Piscataway, NJ}, YEAR = {2022}, DATE = {2022}, JOURNAL = {IEEE Robotics and Automation Letters}, VOLUME = {7}, NUMBER = {2}, PAGES = {738--745}, }
Endnote
%0 Journal Article %A Li, Shijie %A Chen, Xieyuanli %A Liu, Yun %A Dai, Dengxin %A Stachniss, Cyrill %A Gall, J&#252;rgen %+ External Organizations External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Multi-Scale Interaction for Real-Time LiDAR Data Segmentation on an Embedded Platform : %G eng %U http://hdl.handle.net/21.11116/0000-0009-B1AD-C %R 10.1109/LRA.2021.3132059 %7 2022 %D 2022 %J IEEE Robotics and Automation Letters %V 7 %N 2 %& 738 %P 738 - 745 %I IEEE %C Piscataway, NJ %@ false
Vasudevan, A. B., Dai, D., & Van Gool, L. (n.d.). Sound and Visual Representation Learning with Multiple Pretraining Tasks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). New Orleans, LA, USA: IEEE.
(arXiv: 2201.01046, Accepted/in press)
Abstract
Different self-supervised tasks (SSL) reveal different features from the data. The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks. Specifically, for this study, we investigate binaural sounds and image data in isolation. For binaural sounds, we propose three SSL tasks namely, spatial alignment, temporal synchronization of foreground objects and binaural audio and temporal gap prediction. We investigate several approaches of Multi-SSL and give insights into the downstream task performance on video retrieval, spatial sound super resolution, and semantic prediction on the OmniAudio dataset. Our experiments on binaural sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models and fully supervised models in the downstream task performance. As a check of applicability on other modality, we also formulate our Multi-SSL models for image representation learning and we use the recently proposed SSL tasks, MoCov2 and DenseCL. Here, Multi-SSL surpasses recent methods such as MoCov2, DenseCL and DetCo by 2.06%, 3.27% and 1.19% on VOC07 classification and +2.83, +1.56 and +1.61 AP on COCO detection. Code will be made publicly available.
Export
BibTeX
@inproceedings{Vasudevan_CVPR2022, TITLE = {Sound and Visual Representation Learning with Multiple Pretraining Tasks}, AUTHOR = {Vasudevan, Arun Balajee and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, EPRINT = {2201.01046}, EPRINTTYPE = {arXiv}, PUBLISHER = {IEEE}, YEAR = {2022}, PUBLREMARK = {Accepted}, ABSTRACT = {Different self-supervised tasks (SSL) reveal different features from the data. The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks. Specifically, for this study, we investigate binaural sounds and image data in isolation. For binaural sounds, we propose three SSL tasks namely, spatial alignment, temporal synchronization of foreground objects and binaural audio and temporal gap prediction. We investigate several approaches of Multi-SSL and give insights into the downstream task performance on video retrieval, spatial sound super resolution, and semantic prediction on the OmniAudio dataset. Our experiments on binaural sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models and fully supervised models in the downstream task performance. As a check of applicability on other modality, we also formulate our Multi-SSL models for image representation learning and we use the recently proposed SSL tasks, MoCov2 and DenseCL. Here, Multi-SSL surpasses recent methods such as MoCov2, DenseCL and DetCo by 2.06%, 3.27% and 1.19% on VOC07 classification and +2.83, +1.56 and +1.61 AP on COCO detection. Code will be made publicly available.}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Vasudevan, Arun Balajee %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T Sound and Visual Representation Learning with Multiple Pretraining Tasks : %G eng %U http://hdl.handle.net/21.11116/0000-000A-16C0-4 %D 2022 %B 35th IEEE/CVF Conference on Computer Vision and Pattern Recognition %Z date of event: 2022-06-19 - 2022-06-24 %C New Orleans, LA, USA %X Different self-supervised tasks (SSL) reveal different features from the data. The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks. Specifically, for this study, we investigate binaural sounds and image data in isolation. For binaural sounds, we propose three SSL tasks namely, spatial alignment, temporal synchronization of foreground objects and binaural audio and temporal gap prediction. We investigate several approaches of Multi-SSL and give insights into the downstream task performance on video retrieval, spatial sound super resolution, and semantic prediction on the OmniAudio dataset. Our experiments on binaural sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models and fully supervised models in the downstream task performance. As a check of applicability on other modality, we also formulate our Multi-SSL models for image representation learning and we use the recently proposed SSL tasks, MoCov2 and DenseCL. Here, Multi-SSL surpasses recent methods such as MoCov2, DenseCL and DetCo by 2.06%, 3.27% and 1.19% on VOC07 classification and +2.83, +1.56 and +1.61 AP on COCO detection. Code will be made publicly available. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Multimedia, cs.MM %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %I IEEE
Zaech, J.-N., Dai, D., Liniger, A., Danelljan, M., & Van Gool, L. (2022). Learnable Online Graph Representations for 3D Multi-Object Tracking. IEEE Robotics and Automation Letters. doi:10.1109/LRA.2022.3145952
Export
BibTeX
@article{Zaech2104.11747, TITLE = {Learnable Online Graph Representations for {3D} Multi-Object Tracking}, AUTHOR = {Zaech, Jan-Nico and Dai, Dengxin and Liniger, Alexander and Danelljan, Martin and Van Gool, Luc}, LANGUAGE = {eng}, ISSN = {2377-3766}, DOI = {10.1109/LRA.2022.3145952}, PUBLISHER = {IEEE}, ADDRESS = {Piscataway, NJ}, YEAR = {2022}, JOURNAL = {IEEE Robotics and Automation Letters}, }
Endnote
%0 Journal Article %A Zaech, Jan-Nico %A Dai, Dengxin %A Liniger, Alexander %A Danelljan, Martin %A Van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Learnable Online Graph Representations for 3D Multi-Object Tracking : %G eng %U http://hdl.handle.net/21.11116/0000-0009-444F-3 %R 10.1109/LRA.2022.3145952 %7 2022 %D 2022 %J IEEE Robotics and Automation Letters %I IEEE %C Piscataway, NJ %@ false
Gong, R., Li, W., Chen, Y., Dai, D., & Van Gool, L. (2021). DLOW: Domain Flow and Applications. International Journal of Computer Vision, 129. doi:10.1007/s11263-021-01496-2
Export
BibTeX
@article{Gong2021, TITLE = {{DLOW}: {D}omain Flow and Applications}, AUTHOR = {Gong, Rui and Li, Wen and Chen, Yuhua and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, ISSN = {0920-5691}, DOI = {10.1007/s11263-021-01496-2}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, JOURNAL = {International Journal of Computer Vision}, VOLUME = {129}, PAGES = {2865--2888}, }
Endnote
%0 Journal Article %A Gong, Rui %A Li, Wen %A Chen, Yuhua %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T DLOW: Domain Flow and Applications : %G eng %U http://hdl.handle.net/21.11116/0000-0009-2A6C-0 %R 10.1007/s11263-021-01496-2 %7 2021 %D 2021 %J International Journal of Computer Vision %O Int. J. Comput. Vis. %V 129 %& 2865 %P 2865 - 2888 %I Springer %C New York, NY %@ false %U https://github.com/ETHRuiGong/DLOW
Sakaridis, C., Dai, D., & Van Gool, L. (2021). ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding. In IEEE/CVF International Conference on Computer Vision (ICCV 2021). Virtual Event: IEEE. doi:10.1109/ICCV48922.2021.01059
Export
BibTeX
@inproceedings{SakaridisICCV21, TITLE = {{ACDC}: {The} Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding}, AUTHOR = {Sakaridis, Christos and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, DOI = {10.1109/ICCV48922.2021.01059}, PUBLISHER = {IEEE}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF International Conference on Computer Vision (ICCV 2021)}, PAGES = {10745--10755}, ADDRESS = {Virtual Event}, }
Endnote
%0 Conference Proceedings %A Sakaridis, Christos %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding : %G eng %U http://hdl.handle.net/21.11116/0000-0009-446A-4 %R 10.1109/ICCV48922.2021.01059 %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual Event %B IEEE/CVF International Conference on Computer Vision %P 10745 - 10755 %I IEEE %U https://acdc.vision.ee.ethz.ch/
Hoyer, L., Dai, D., Wang, Q., Chen, Y., & Van Gool, L. (2021). Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation. Retrieved from https://arxiv.org/abs/2108.12545
(arXiv: 2108.12545)
Abstract
Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained only on unlabeled image sequences. In particular, we utilize SDE as an auxiliary task comprehensively across the entire learning framework: First, we automatically select the most useful samples to be annotated for semantic segmentation based on the correlation of sample diversity and difficulty between SDE and semantic segmentation. Second, we implement a strong data augmentation by mixing images and labels using the geometry of the scene. Third, we transfer knowledge from features learned during SDE to semantic segmentation by means of transfer and multi-task learning. And fourth, we exploit additional labeled synthetic data with Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data. We validate the proposed model on the Cityscapes dataset, where all four contributions demonstrate significant performance gains, and achieve state-of-the-art results for semi-supervised semantic segmentation as well as for semi-supervised domain adaptation. In particular, with only 1/30 of the Cityscapes labels, our method achieves 92% of the fully-supervised baseline performance and even 97% when exploiting additional data from GTA. The source code is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.
Export
BibTeX
@online{Hoyer2108.12545, TITLE = {Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation}, AUTHOR = {Hoyer, Lukas and Dai, Dengxin and Wang, Qin and Chen, Yuhua and Van Gool, Luc}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2108.12545}, EPRINT = {2108.12545}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained only on unlabeled image sequences. In particular, we utilize SDE as an auxiliary task comprehensively across the entire learning framework: First, we automatically select the most useful samples to be annotated for semantic segmentation based on the correlation of sample diversity and difficulty between SDE and semantic segmentation. Second, we implement a strong data augmentation by mixing images and labels using the geometry of the scene. Third, we transfer knowledge from features learned during SDE to semantic segmentation by means of transfer and multi-task learning. And fourth, we exploit additional labeled synthetic data with Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data. We validate the proposed model on the Cityscapes dataset, where all four contributions demonstrate significant performance gains, and achieve state-of-the-art results for semi-supervised semantic segmentation as well as for semi-supervised domain adaptation. In particular, with only 1/30 of the Cityscapes labels, our method achieves 92% of the fully-supervised baseline performance and even 97% when exploiting additional data from GTA. The source code is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.}, }
Endnote
%0 Report %A Hoyer, Lukas %A Dai, Dengxin %A Wang, Qin %A Chen, Yuhua %A Van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4449-9 %U https://arxiv.org/abs/2108.12545 %D 2021 %X Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained only on unlabeled image sequences. In particular, we utilize SDE as an auxiliary task comprehensively across the entire learning framework: First, we automatically select the most useful samples to be annotated for semantic segmentation based on the correlation of sample diversity and difficulty between SDE and semantic segmentation. Second, we implement a strong data augmentation by mixing images and labels using the geometry of the scene. Third, we transfer knowledge from features learned during SDE to semantic segmentation by means of transfer and multi-task learning. And fourth, we exploit additional labeled synthetic data with Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data. We validate the proposed model on the Cityscapes dataset, where all four contributions demonstrate significant performance gains, and achieve state-of-the-art results for semi-supervised semantic segmentation as well as for semi-supervised domain adaptation. In particular, with only 1/30 of the Cityscapes labels, our method achieves 92% of the fully-supervised baseline performance and even 97% when exploiting additional data from GTA. The source code is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Fan, Y., Dai, D., & Schiele, B. (n.d.). CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). New Orleans, LA, USA: IEEE.
(arXiv: 2112.04564, Accepted/in press)
Abstract
In this paper, we propose a novel co-learning framework (CoSSL) with decoupled representation learning and classifier learning for imbalanced SSL. To handle the data imbalance, we devise Tail-class Feature Enhancement (TFE) for classifier learning. Furthermore, the current evaluation protocol for imbalanced SSL focuses only on balanced test sets, which has limited practicality in real-world scenarios. Therefore, we further conduct a comprehensive evaluation under various shifted test distributions. In experiments, we show that our approach outperforms other methods over a large range of shifted distributions, achieving state-of-the-art performance on benchmark datasets ranging from CIFAR-10, CIFAR-100, ImageNet, to Food-101. Our code will be made publicly available.
Export
BibTeX
@inproceedings{Fan_CVPR2022, TITLE = {{CoSSL}: {C}o-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning}, AUTHOR = {Fan, Yue and Dai, Dengxin and Schiele, Bernt}, LANGUAGE = {eng}, EPRINT = {2112.04564}, EPRINTTYPE = {arXiv}, PUBLISHER = {IEEE}, YEAR = {2022}, PUBLREMARK = {Accepted}, ABSTRACT = {In this paper, we propose a novel co-learning framework (CoSSL) with decoupled representation learning and classifier learning for imbalanced SSL. To handle the data imbalance, we devise Tail-class Feature Enhancement (TFE) for classifier learning. Furthermore, the current evaluation protocol for imbalanced SSL focuses only on balanced test sets, which has limited practicality in real-world scenarios. Therefore, we further conduct a comprehensive evaluation under various shifted test distributions. In experiments, we show that our approach outperforms other methods over a large range of shifted distributions, achieving state-of-the-art performance on benchmark datasets ranging from CIFAR-10, CIFAR-100, ImageNet, to Food-101. Our code will be made publicly available.}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Fan, Yue %A Dai, Dengxin %A Schiele, Bernt %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society %T CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning : %G eng %U http://hdl.handle.net/21.11116/0000-000A-16BA-C %D 2022 %B 35th IEEE/CVF Conference on Computer Vision and Pattern Recognition %Z date of event: 2022-06-19 - 2022-06-24 %C New Orleans, LA, USA %X In this paper, we propose a novel co-learning framework (CoSSL) with decoupled representation learning and classifier learning for imbalanced SSL. To handle the data imbalance, we devise Tail-class Feature Enhancement (TFE) for classifier learning. Furthermore, the current evaluation protocol for imbalanced SSL focuses only on balanced test sets, which has limited practicality in real-world scenarios. Therefore, we further conduct a comprehensive evaluation under various shifted test distributions. In experiments, we show that our approach outperforms other methods over a large range of shifted distributions, achieving state-of-the-art performance on benchmark datasets ranging from CIFAR-10, CIFAR-100, ImageNet, to Food-101. Our code will be made publicly available. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Learning, cs.LG %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %I IEEE
Cai, S., Obukhov, A., Dai, D., & Van Gool, L. (n.d.). Pix2NeRF: Unsupervised Conditional Pi-GAN for Single Image to Neural Radiance Fields Translation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). New Orleans, LA, USA: IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{Cai_CVPR2022, TITLE = {{Pix2NeRF}: {U}nsupervised Conditional $\pi$-{GAN} for Single Image to Neural Radiance Fields Translation}, AUTHOR = {Cai, Shengqu and Obukhov, Anton and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2022}, PUBLREMARK = {Accepted}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Cai, Shengqu %A Obukhov, Anton %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T Pix2NeRF: Unsupervised Conditional Pi-GAN for Single Image to Neural Radiance Fields Translation : %G eng %U http://hdl.handle.net/21.11116/0000-000A-160D-0 %D 2022 %B 35th IEEE/CVF Conference on Computer Vision and Pattern Recognition %Z date of event: 2022-06-19 - 2022-06-24 %C New Orleans, LA, USA %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %I IEEE
Ma, X., Wang, Z., Zhan, Y., Zheng, Y., Wang, Z., Dai, D., & Lin, C.-W. (n.d.). Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). New Orleans, LA, USA: IEEE.
(Accepted/in press)
Abstract
Although considerable progress has been made in semantic scene understanding under clear weather, it is still a tough problem under adverse weather conditions, such as dense fog, due to the uncertainty caused by imperfect observations. Besides, difficulties in collecting and labeling foggy images hinder the progress of this field. Considering the success in semantic scene understanding under clear weather, we think it is reasonable to transfer knowledge learned from clear images to the foggy domain. As such, the problem becomes to bridge the domain gap between clear images and foggy images. Unlike previous methods that mainly focus on closing the domain gap caused by fog -- defogging the foggy images or fogging the clear images, we propose to alleviate the domain gap by considering fog influence and style variation simultaneously. The motivation is based on our finding that the style-related gap and the fog-related gap can be divided and closed respectively, by adding an intermediate domain. Thus, we propose a new pipeline to cumulatively adapt style, fog and the dual-factor (style and fog). Specifically, we devise a unified framework to disentangle the style factor and the fog factor separately, and then the dual-factor from images in different domains. Furthermore, we collaborate the disentanglement of three factors with a novel cumulative loss to thoroughly disentangle these three factors. Our method achieves the state-of-the-art performance on three benchmarks and shows generalization ability in rainy and snowy scenes.
Export
BibTeX
@inproceedings{Ma_CVPR2022, TITLE = {Both Style and Fog Matter: {C}umulative Domain Adaptation for Semantic Foggy Scene Understanding}, AUTHOR = {Ma, Xianzheng and Wang, Zhixiang and Zhan, Yacheng and Zheng, Yinqiang and Wang, Zheng and Dai, Dengxin and Lin, Chia-Wen}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2022}, PUBLREMARK = {Accepted}, ABSTRACT = {Although considerable progress has been made in semantic scene understanding under clear weather, it is still a tough problem under adverse weather conditions, such as dense fog, due to the uncertainty caused by imperfect observations. Besides, difficulties in collecting and labeling foggy images hinder the progress of this field. Considering the success in semantic scene understanding under clear weather, we think it is reasonable to transfer knowledge learned from clear images to the foggy domain. As such, the problem becomes to bridge the domain gap between clear images and foggy images. Unlike previous methods that mainly focus on closing the domain gap caused by fog -- defogging the foggy images or fogging the clear images, we propose to alleviate the domain gap by considering fog influence and style variation simultaneously. The motivation is based on our finding that the style-related gap and the fog-related gap can be divided and closed respectively, by adding an intermediate domain. Thus, we propose a new pipeline to cumulatively adapt style, fog and the dual-factor (style and fog). Specifically, we devise a unified framework to disentangle the style factor and the fog factor separately, and then the dual-factor from images in different domains. Furthermore, we collaborate the disentanglement of three factors with a novel cumulative loss to thoroughly disentangle these three factors. Our method achieves the state-of-the-art performance on three benchmarks and shows generalization ability in rainy and snowy scenes.}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Ma, Xianzheng %A Wang, Zhixiang %A Zhan, Yacheng %A Zheng, Yinqiang %A Wang, Zheng %A Dai, Dengxin %A Lin, Chia-Wen %+ External Organizations External Organizations External Organizations External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding : %G eng %U http://hdl.handle.net/21.11116/0000-000A-165D-6 %D 2022 %B 35th IEEE/CVF Conference on Computer Vision and Pattern Recognition %Z date of event: 2022-06-19 - 2022-06-24 %C New Orleans, LA, USA %X Although considerable progress has been made in semantic scene understanding under clear weather, it is still a tough problem under adverse weather conditions, such as dense fog, due to the uncertainty caused by imperfect observations. Besides, difficulties in collecting and labeling foggy images hinder the progress of this field. Considering the success in semantic scene understanding under clear weather, we think it is reasonable to transfer knowledge learned from clear images to the foggy domain. As such, the problem becomes to bridge the domain gap between clear images and foggy images. Unlike previous methods that mainly focus on closing the domain gap caused by fog -- defogging the foggy images or fogging the clear images, we propose to alleviate the domain gap by considering fog influence and style variation simultaneously. The motivation is based on our finding that the style-related gap and the fog-related gap can be divided and closed respectively, by adding an intermediate domain. Thus, we propose a new pipeline to cumulatively adapt style, fog and the dual-factor (style and fog). Specifically, we devise a unified framework to disentangle the style factor and the fog factor separately, and then the dual-factor from images in different domains. Furthermore, we collaborate the disentanglement of three factors with a novel cumulative loss to thoroughly disentangle these three factors. Our method achieves the state-of-the-art performance on three benchmarks and shows generalization ability in rainy and snowy scenes. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %I IEEE
Vödisch, N., Unal, O., Li, K., Van Gool, L., & Dai, D. (2022). End-to-End Optimization of LiDAR Beam Configuration for 3D Object Detection and Localization. IEEE Robotics and Automation Letters, 7(2). doi:10.1109/LRA.2022.3142738
Export
BibTeX
@article{Voedisch2022, TITLE = {End-to-End Optimization of {LiDAR} Beam Configuration for {3D} Object Detection and Localization}, AUTHOR = {V{\"o}disch, Niclas and Unal, Ozan and Li, Ke and Van Gool, Luc and Dai, Dengxin}, LANGUAGE = {eng}, ISSN = {2377-3766}, DOI = {10.1109/LRA.2022.3142738}, PUBLISHER = {IEEE}, ADDRESS = {New York, NY}, YEAR = {2022}, JOURNAL = {IEEE Robotics and Automation Letters}, VOLUME = {7}, NUMBER = {2}, PAGES = {2242--2249}, }
Endnote
%0 Journal Article %A V&#246;disch, Niclas %A Unal, Ozan %A Li, Ke %A Van Gool, Luc %A Dai, Dengxin %+ External Organizations External Organizations External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society %T End-to-End Optimization of LiDAR Beam Configuration for 3D Object Detection and Localization : %G eng %U http://hdl.handle.net/21.11116/0000-000A-0A71-C %R 10.1109/LRA.2022.3142738 %7 2022 %D 2022 %J IEEE Robotics and Automation Letters %V 7 %N 2 %& 2242 %P 2242 - 2249 %I IEEE %C New York, NY %@ false
Sun, G., Probst, T., Paudel, D. P., Popovic, N., Kanakis, M., Patel, J., … Van Gool, L. (2021). Task Switching Network for Multi-task Learning. In IEEE/CVF International Conference on Computer Vision (ICCV 2021). Virtual Event: IEEE. doi:10.1109/ICCV48922.2021.00818
Export
BibTeX
@inproceedings{Sun_ICCV21, TITLE = {Task Switching Network for Multi-task Learning}, AUTHOR = {Sun, Guolei and Probst, Thomas and Paudel, Danda Pani and Popovic, Nikola and Kanakis, Menelaos and Patel, Jagruti and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, ISBN = {978-1-6654-2812-5}, DOI = {10.1109/ICCV48922.2021.00818}, PUBLISHER = {IEEE}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE/CVF International Conference on Computer Vision (ICCV 2021)}, PAGES = {8271--8280}, ADDRESS = {Virtual Event}, }
Endnote
%0 Conference Proceedings %A Sun, Guolei %A Probst, Thomas %A Paudel, Danda Pani %A Popovic, Nikola %A Kanakis, Menelaos %A Patel, Jagruti %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T Task Switching Network for Multi-task Learning : %G eng %U http://hdl.handle.net/21.11116/0000-000A-C9C4-6 %R 10.1109/ICCV48922.2021.00818 %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual Event %B IEEE/CVF International Conference on Computer Vision %P 8271 - 8280 %I IEEE %@ 978-1-6654-2812-5
Hoyer, L., Dai, D., & Van Gool, L. (n.d.). DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). New Orleans, LA, USA: IEEE.
(arXiv: 2111.14887, Accepted/in press)
Abstract
As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in unsupervised domain adaptation (UDA). Even though a large number of methods propose new adaptation strategies, they are mostly based on outdated network architectures. As the influence of recent network architectures has not been systematically studied, we first benchmark different network architectures for UDA and then propose a novel UDA method, DAFormer, based on the benchmark results. The DAFormer network consists of a Transformer encoder and a multi-level context-aware feature fusion decoder. It is enabled by three simple but crucial training strategies to stabilize the training and to avoid overfitting DAFormer to the source domain: While the Rare Class Sampling on the source domain improves the quality of pseudo-labels by mitigating the confirmation bias of self-training towards common classes, the Thing-Class ImageNet Feature Distance and a learning rate warmup promote feature transfer from ImageNet pretraining. DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes and enables learning even difficult classes such as train, bus, and truck well. The implementation is available at https://github.com/lhoyer/DAFormer.
Export
BibTeX
@inproceedings{Hoyer_CVPR2022, TITLE = {{DAFormer}: {I}mproving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation}, AUTHOR = {Hoyer, Lukas and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, EPRINT = {2111.14887}, EPRINTTYPE = {arXiv}, PUBLISHER = {IEEE}, YEAR = {2022}, PUBLREMARK = {Accepted}, ABSTRACT = {As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in unsupervised domain adaptation (UDA). Even though a large number of methods propose new adaptation strategies, they are mostly based on outdated network architectures. As the influence of recent network architectures has not been systematically studied, we first benchmark different network architectures for UDA and then propose a novel UDA method, DAFormer, based on the benchmark results. The DAFormer network consists of a Transformer encoder and a multi-level context-aware feature fusion decoder. It is enabled by three simple but crucial training strategies to stabilize the training and to avoid overfitting DAFormer to the source domain: While the Rare Class Sampling on the source domain improves the quality of pseudo-labels by mitigating the confirmation bias of self-training towards common classes, the Thing-Class ImageNet Feature Distance and a learning rate warmup promote feature transfer from ImageNet pretraining. DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes and enables learning even difficult classes such as train, bus, and truck well. The implementation is available at https://github.com/lhoyer/DAFormer.}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Hoyer, Lukas %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation : %G eng %U http://hdl.handle.net/21.11116/0000-000A-16B5-1 %D 2022 %B 35th IEEE/CVF Conference on Computer Vision and Pattern Recognition %Z date of event: 2022-06-19 - 2022-06-24 %C New Orleans, LA, USA %X As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in unsupervised domain adaptation (UDA). Even though a large number of methods propose new adaptation strategies, they are mostly based on outdated network architectures. As the influence of recent network architectures has not been systematically studied, we first benchmark different network architectures for UDA and then propose a novel UDA method, DAFormer, based on the benchmark results. The DAFormer network consists of a Transformer encoder and a multi-level context-aware feature fusion decoder. It is enabled by three simple but crucial training strategies to stabilize the training and to avoid overfitting DAFormer to the source domain: While the Rare Class Sampling on the source domain improves the quality of pseudo-labels by mitigating the confirmation bias of self-training towards common classes, the Thing-Class ImageNet Feature Distance and a learning rate warmup promote feature transfer from ImageNet pretraining. DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes and enables learning even difficult classes such as train, bus, and truck well. The implementation is available at https://github.com/lhoyer/DAFormer. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %I IEEE
Li, K., Dai, D., & van Gool, L. (2022). Hyperspectral Image Super-Resolution with RGB Image Super-Resolution as an Auxiliary Task. In 2022 IEEE Winter Conference on Applications of Computer Vision (WACV 2022). Waikoloa Village, HI, USA: IEEE. doi:10.1109/WACV51458.2022.00409
Export
BibTeX
@inproceedings{Li_WACV22, TITLE = {Hyperspectral Image Super-Resolution with {RGB} Image Super-Resolution as an Auxiliary Task}, AUTHOR = {Li, Ke and Dai, Dengxin and van Gool, Luc}, LANGUAGE = {eng}, ISBN = {978-1-6654-0915-5}, DOI = {10.1109/WACV51458.2022.00409}, PUBLISHER = {IEEE}, YEAR = {2022}, BOOKTITLE = {2022 IEEE Winter Conference on Applications of Computer Vision (WACV 2022)}, PAGES = {4039--4048}, ADDRESS = {Waikoloa Village, HI, USA}, }
Endnote
%0 Conference Proceedings %A Li, Ke %A Dai, Dengxin %A van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T Hyperspectral Image Super-Resolution with RGB Image Super-Resolution as an Auxiliary Task : %G eng %U http://hdl.handle.net/21.11116/0000-000A-CD2C-F %R 10.1109/WACV51458.2022.00409 %D 2022 %B IEEE Winter Conference on Applications of Computer Vision %Z date of event: 2022-01-04 - 2022-01-08 %C Waikoloa Village, HI, USA %B 2022 IEEE Winter Conference on Applications of Computer Vision %P 4039 - 4048 %I IEEE %@ 978-1-6654-0915-5
Zaech, J.-N., Liniger, A., Danelljan, M., Dai, D., & Van Gool, L. (n.d.). Adiabatic Quantum Computing for Multi Object Tracking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). New Orleans, LA, USA: IEEE.
(arXiv: 2202.08837, Accepted/in press)
Abstract
Multi-Object Tracking (MOT) is most often approached in the tracking-by-detection paradigm, where object detections are associated through time. The association step naturally leads to discrete optimization problems. As these optimization problems are often NP-hard, they can only be solved exactly for small instances on current hardware. Adiabatic quantum computing (AQC) offers a solution for this, as it has the potential to provide a considerable speedup on a range of NP-hard optimization problems in the near future. However, current MOT formulations are unsuitable for quantum computing due to their scaling properties. In this work, we therefore propose the first MOT formulation designed to be solved with AQC. We employ an Ising model that represents the quantum mechanical system implemented on the AQC. We show that our approach is competitive compared with state-of-the-art optimization-based approaches, even when using of-the-shelf integer programming solvers. Finally, we demonstrate that our MOT problem is already solvable on the current generation of real quantum computers for small examples, and analyze the properties of the measured solutions.
Export
BibTeX
@inproceedings{Zaech_CVPR2022, TITLE = {Adiabatic Quantum Computing for Multi Object Tracking}, AUTHOR = {Zaech, Jan-Nico and Liniger, Alexander and Danelljan, Martin and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, EPRINT = {2202.08837}, EPRINTTYPE = {arXiv}, PUBLISHER = {IEEE}, YEAR = {2022}, PUBLREMARK = {Accepted}, ABSTRACT = {Multi-Object Tracking (MOT) is most often approached in the tracking-by-detection paradigm, where object detections are associated through time. The association step naturally leads to discrete optimization problems. As these optimization problems are often NP-hard, they can only be solved exactly for small instances on current hardware. Adiabatic quantum computing (AQC) offers a solution for this, as it has the potential to provide a considerable speedup on a range of NP-hard optimization problems in the near future. However, current MOT formulations are unsuitable for quantum computing due to their scaling properties. In this work, we therefore propose the first MOT formulation designed to be solved with AQC. We employ an Ising model that represents the quantum mechanical system implemented on the AQC. We show that our approach is competitive compared with state-of-the-art optimization-based approaches, even when using of-the-shelf integer programming solvers. Finally, we demonstrate that our MOT problem is already solvable on the current generation of real quantum computers for small examples, and analyze the properties of the measured solutions.}, BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Zaech, Jan-Nico %A Liniger, Alexander %A Danelljan, Martin %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T Adiabatic Quantum Computing for Multi Object Tracking : %G eng %U http://hdl.handle.net/21.11116/0000-000A-16C3-1 %D 2022 %B 35th IEEE/CVF Conference on Computer Vision and Pattern Recognition %Z date of event: 2022-06-19 - 2022-06-24 %C New Orleans, LA, USA %X Multi-Object Tracking (MOT) is most often approached in the tracking-by-detection paradigm, where object detections are associated through time. The association step naturally leads to discrete optimization problems. As these optimization problems are often NP-hard, they can only be solved exactly for small instances on current hardware. Adiabatic quantum computing (AQC) offers a solution for this, as it has the potential to provide a considerable speedup on a range of NP-hard optimization problems in the near future. However, current MOT formulations are unsuitable for quantum computing due to their scaling properties. In this work, we therefore propose the first MOT formulation designed to be solved with AQC. We employ an Ising model that represents the quantum mechanical system implemented on the AQC. We show that our approach is competitive compared with state-of-the-art optimization-based approaches, even when using of-the-shelf integer programming solvers. Finally, we demonstrate that our MOT problem is already solvable on the current generation of real quantum computers for small examples, and analyze the properties of the measured solutions. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Learning, cs.LG %B IEEE/CVF Conference on Computer Vision and Pattern Recognition %I IEEE