D2
Computer Vision and Machine Learning

Dengxin Dai (Senior Researcher)

Dr. Dengxin Dai

Address
Max-Planck-Institut für Informatik
Saarland Informatics Campus
Campus E1 4
66123 Saarbrücken
Location
E1 4 - 604
Phone
+49 681 9325 2104
Fax
+49 681 9325 2099

News!

  1. I am starting a new research group at MPI for Informatics, working on deep learning-based perception for autonomous driving, especially on scaling existing visual perception models to novel domains,  to new data modality, to unseen classes and to more tasks.
  2. I am hiring PhD students and also offer projects for master's thesis. If you are interested, please contact me <ddai@mpi-inf.mpg.de> with your CV and transcripts. I also accept applicants with CSC scholorship.
  3. I will be an Area Chair for CVPR 2022.

Publications

Gong, R., Li, W., Chen, Y., Dai, D., & Van Gool, L. (2021). DLOW: Domain Flow and Applications. International Journal of Computer Vision, 129. doi:10.1007/s11263-021-01496-2
Export
BibTeX
@article{Gong2021, TITLE = {{DLOW}: {D}omain Flow and Applications}, AUTHOR = {Gong, Rui and Li, Wen and Chen, Yuhua and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, ISSN = {0920-5691}, DOI = {10.1007/s11263-021-01496-2}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, JOURNAL = {International Journal of Computer Vision}, VOLUME = {129}, PAGES = {2865--2888}, }
Endnote
%0 Journal Article %A Gong, Rui %A Li, Wen %A Chen, Yuhua %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T DLOW: Domain Flow and Applications : %G eng %U http://hdl.handle.net/21.11116/0000-0009-2A6C-0 %R 10.1007/s11263-021-01496-2 %7 2021 %D 2021 %J International Journal of Computer Vision %O Int. J. Comput. Vis. %V 129 %& 2865 %P 2865 - 2888 %I Springer %C New York, NY %@ false %U https://github.com/ETHRuiGong/DLOW
Dai, D., Vasudevan, A. B., Matas, J., & Van Gool, L. (2021). Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds. Retrieved from https://arxiv.org/abs/2109.02763
(arXiv: 2109.02763)
Abstract
Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, and the depth map of the scene. To this aim, we propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360-degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of multiple vision teacher methods and a sound student method -- the student method is trained to generate the same results as the teacher methods do. This way, the auditory system can be trained without using human annotations. To further boost the performance, we propose another novel auxiliary task, coined Spatial Sound Super-Resolution, to increase the directional resolution of sounds. We then formulate the four tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results show that 1) our method achieves good results for all four tasks, 2) the four tasks are mutually beneficial -- training them together achieves the best performance, 3) the number and orientation of microphones are both important, and 4) features learned from the standard spectrogram and features obtained by the classic signal processing pipeline are complementary for auditory perception tasks. The data and code are released.
Export
BibTeX
@online{Dai2109.02763, TITLE = {Binaural {SoundNet}: Predicting Semantics, Depth and Motion with Binaural Sounds}, AUTHOR = {Dai, Dengxin and Vasudevan, Arun Balajee and Matas, Jiri and Van Gool, Luc}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2109.02763}, EPRINT = {2109.02763}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, and the depth map of the scene. To this aim, we propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360-degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of multiple vision teacher methods and a sound student method -- the student method is trained to generate the same results as the teacher methods do. This way, the auditory system can be trained without using human annotations. To further boost the performance, we propose another novel auxiliary task, coined Spatial Sound Super-Resolution, to increase the directional resolution of sounds. We then formulate the four tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results show that 1) our method achieves good results for all four tasks, 2) the four tasks are mutually beneficial -- training them together achieves the best performance, 3) the number and orientation of microphones are both important, and 4) features learned from the standard spectrogram and features obtained by the classic signal processing pipeline are complementary for auditory perception tasks. The data and code are released.}, }
Endnote
%0 Report %A Dai, Dengxin %A Vasudevan, Arun Balajee %A Matas, Jiri %A Van Gool, Luc %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds : %G eng %U http://hdl.handle.net/21.11116/0000-0009-444C-6 %U https://arxiv.org/abs/2109.02763 %D 2021 %X Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, and the depth map of the scene. To this aim, we propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360-degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of multiple vision teacher methods and a sound student method -- the student method is trained to generate the same results as the teacher methods do. This way, the auditory system can be trained without using human annotations. To further boost the performance, we propose another novel auxiliary task, coined Spatial Sound Super-Resolution, to increase the directional resolution of sounds. We then formulate the four tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results show that 1) our method achieves good results for all four tasks, 2) the four tasks are mutually beneficial -- training them together achieves the best performance, 3) the number and orientation of microphones are both important, and 4) features learned from the standard spectrogram and features obtained by the classic signal processing pipeline are complementary for auditory perception tasks. The data and code are released. %K Computer Science, Sound, cs.SD,Computer Science, Computer Vision and Pattern Recognition, cs.CV,eess.AS
Hoyer, L., Dai, D., Wang, Q., Chen, Y., & Van Gool, L. (2021). Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation. Retrieved from https://arxiv.org/abs/2108.12545
(arXiv: 2108.12545)
Abstract
Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained only on unlabeled image sequences. In particular, we utilize SDE as an auxiliary task comprehensively across the entire learning framework: First, we automatically select the most useful samples to be annotated for semantic segmentation based on the correlation of sample diversity and difficulty between SDE and semantic segmentation. Second, we implement a strong data augmentation by mixing images and labels using the geometry of the scene. Third, we transfer knowledge from features learned during SDE to semantic segmentation by means of transfer and multi-task learning. And fourth, we exploit additional labeled synthetic data with Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data. We validate the proposed model on the Cityscapes dataset, where all four contributions demonstrate significant performance gains, and achieve state-of-the-art results for semi-supervised semantic segmentation as well as for semi-supervised domain adaptation. In particular, with only 1/30 of the Cityscapes labels, our method achieves 92% of the fully-supervised baseline performance and even 97% when exploiting additional data from GTA. The source code is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.
Export
BibTeX
@online{Hoyer2108.12545, TITLE = {Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation}, AUTHOR = {Hoyer, Lukas and Dai, Dengxin and Wang, Qin and Chen, Yuhua and Van Gool, Luc}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2108.12545}, EPRINT = {2108.12545}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained only on unlabeled image sequences. In particular, we utilize SDE as an auxiliary task comprehensively across the entire learning framework: First, we automatically select the most useful samples to be annotated for semantic segmentation based on the correlation of sample diversity and difficulty between SDE and semantic segmentation. Second, we implement a strong data augmentation by mixing images and labels using the geometry of the scene. Third, we transfer knowledge from features learned during SDE to semantic segmentation by means of transfer and multi-task learning. And fourth, we exploit additional labeled synthetic data with Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data. We validate the proposed model on the Cityscapes dataset, where all four contributions demonstrate significant performance gains, and achieve state-of-the-art results for semi-supervised semantic segmentation as well as for semi-supervised domain adaptation. In particular, with only 1/30 of the Cityscapes labels, our method achieves 92% of the fully-supervised baseline performance and even 97% when exploiting additional data from GTA. The source code is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.}, }
Endnote
%0 Report %A Hoyer, Lukas %A Dai, Dengxin %A Wang, Qin %A Chen, Yuhua %A Van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4449-9 %U https://arxiv.org/abs/2108.12545 %D 2021 %X Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained only on unlabeled image sequences. In particular, we utilize SDE as an auxiliary task comprehensively across the entire learning framework: First, we automatically select the most useful samples to be annotated for semantic segmentation based on the correlation of sample diversity and difficulty between SDE and semantic segmentation. Second, we implement a strong data augmentation by mixing images and labels using the geometry of the scene. Third, we transfer knowledge from features learned during SDE to semantic segmentation by means of transfer and multi-task learning. And fourth, we exploit additional labeled synthetic data with Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data. We validate the proposed model on the Cityscapes dataset, where all four contributions demonstrate significant performance gains, and achieve state-of-the-art results for semi-supervised semantic segmentation as well as for semi-supervised domain adaptation. In particular, with only 1/30 of the Cityscapes labels, our method achieves 92% of the fully-supervised baseline performance and even 97% when exploiting additional data from GTA. The source code is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV
Zaech, J.-N., Dai, D., Liniger, A., Danelljan, M., & Van Gool, L. (2021). Learnable Online Graph Representations for 3D Multi-Object Tracking. Retrieved from https://arxiv.org/abs/2104.11747
(arXiv: 2104.11747)
Abstract
Tracking of objects in 3D is a fundamental task in computer vision that finds use in a wide range of applications such as autonomous driving, robotics or augmented reality. Most recent approaches for 3D multi object tracking (MOT) from LIDAR use object dynamics together with a set of handcrafted features to match detections of objects. However, manually designing such features and heuristics is cumbersome and often leads to suboptimal performance. In this work, we instead strive towards a unified and learning based approach to the 3D MOT problem. We design a graph structure to jointly process detection and track states in an online manner. To this end, we employ a Neural Message Passing network for data association that is fully trainable. Our approach provides a natural way for track initialization and handling of false positive detections, while significantly improving track stability. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
Export
BibTeX
@online{Zaech2104.11747, TITLE = {Learnable Online Graph Representations for {3D} Multi-Object Tracking}, AUTHOR = {Zaech, Jan-Nico and Dai, Dengxin and Liniger, Alexander and Danelljan, Martin and Van Gool, Luc}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2104.11747}, EPRINT = {2104.11747}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Tracking of objects in 3D is a fundamental task in computer vision that finds use in a wide range of applications such as autonomous driving, robotics or augmented reality. Most recent approaches for 3D multi object tracking (MOT) from LIDAR use object dynamics together with a set of handcrafted features to match detections of objects. However, manually designing such features and heuristics is cumbersome and often leads to suboptimal performance. In this work, we instead strive towards a unified and learning based approach to the 3D MOT problem. We design a graph structure to jointly process detection and track states in an online manner. To this end, we employ a Neural Message Passing network for data association that is fully trainable. Our approach provides a natural way for track initialization and handling of false positive detections, while significantly improving track stability. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.}, }
Endnote
%0 Report %A Zaech, Jan-Nico %A Dai, Dengxin %A Liniger, Alexander %A Danelljan, Martin %A Van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Learnable Online Graph Representations for 3D Multi-Object Tracking : %G eng %U http://hdl.handle.net/21.11116/0000-0009-444F-3 %U https://arxiv.org/abs/2104.11747 %D 2021 %X Tracking of objects in 3D is a fundamental task in computer vision that finds use in a wide range of applications such as autonomous driving, robotics or augmented reality. Most recent approaches for 3D multi object tracking (MOT) from LIDAR use object dynamics together with a set of handcrafted features to match detections of objects. However, manually designing such features and heuristics is cumbersome and often leads to suboptimal performance. In this work, we instead strive towards a unified and learning based approach to the 3D MOT problem. We design a graph structure to jointly process detection and track states in an online manner. To this end, we employ a Neural Message Passing network for data association that is fully trainable. Our approach provides a natural way for track initialization and handling of false positive detections, while significantly improving track stability. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Robotics, cs.RO
Zhang, Z., Liniger, A., Dai, D., Yu, F., & Van Gool, L. (n.d.). End-to-End Urban Driving by Imitating a Reinforcement Learning Coach. In International Conference on Computer Vision (ICCV 2021). Virtual: IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{zhang2021roach, TITLE = {End-to-End Urban Driving by Imitating a Reinforcement Learning Coach}, AUTHOR = {Zhang, Zhejun and Liniger, Alexander and Dai, Dengxin and Yu, Fisher and Van Gool, Luc}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {International Conference on Computer Vision (ICCV 2021)}, ADDRESS = {Virtual}, }
Endnote
%0 Conference Proceedings %A Zhang, Zhejun %A Liniger, Alexander %A Dai, Dengxin %A Yu, Fisher %A Van Gool, Luc %+ External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations %T End-to-End Urban Driving by Imitating a Reinforcement Learning Coach : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4452-E %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual %B International Conference on Computer Vision %I IEEE %U https://github.com/zhejz/carla-roach
Hahner, M., Sakaridis, C., Dai, D., & Van Gool, L. (n.d.). Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather. In International Conference on Computer Vision (ICCV 2021). Virtual: IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{HahnerICCV21, TITLE = {Fog Simulation on Real {LiDAR} Point Clouds for {3D} Object Detection in Adverse Weather}, AUTHOR = {Hahner, Martin and Sakaridis, Christos and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {International Conference on Computer Vision (ICCV 2021)}, ADDRESS = {Virtual}, }
Endnote
%0 Conference Proceedings %A Hahner, Martin %A Sakaridis, Christos %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather : %G eng %U http://hdl.handle.net/21.11116/0000-0009-445F-1 %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual %B International Conference on Computer Vision %I IEEE %U https://github.com/MartinHahner/LiDAR_fog_sim
Sakaridis, C., Dai, D., & Van Gool, L. (n.d.). ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding. In International Conference on Computer Vision (ICCV 2021). Virtual: IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{SakaridisICCV21, TITLE = {{ACDC}: {The} Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding}, AUTHOR = {Sakaridis, Christos and Dai, Dengxin and Van Gool, Luc}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {International Conference on Computer Vision (ICCV 2021)}, ADDRESS = {Virtual}, }
Endnote
%0 Conference Proceedings %A Sakaridis, Christos %A Dai, Dengxin %A Van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding : %G eng %U http://hdl.handle.net/21.11116/0000-0009-446A-4 %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual %B International Conference on Computer Vision %I IEEE %U https://acdc.vision.ee.ethz.ch/
Gong, R., Dai, D., Chen, Y., Li, W., & Van Gool, L. (n.d.). mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets. In International Conference on Computer Vision (ICCV 2021). Virtual: IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{GongICCV21, TITLE = {{mDALU}: {M}ulti-Source Domain Adaptation and Label Unification with Partial Datasets}, AUTHOR = {Gong, Rui and Dai, Dengxin and Chen, Yuhua and Li, Wen and Van Gool, Luc}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {International Conference on Computer Vision (ICCV 2021)}, ADDRESS = {Virtual}, }
Endnote
%0 Conference Proceedings %A Gong, Rui %A Dai, Dengxin %A Chen, Yuhua %A Li, Wen %A Van Gool, Luc %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets : %G eng %U http://hdl.handle.net/21.11116/0000-0009-4476-6 %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual %B International Conference on Computer Vision %I IEEE
Wang, Q., Dai, D., Hoyer, L., Van Gool, L., & Fink, O. (n.d.). Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation. In International Conference on Computer Vision (ICCV 2021). Virtual: IEEE.
(Accepted/in press)
Export
BibTeX
@inproceedings{wang2021domain, TITLE = {Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation}, AUTHOR = {Wang, Qin and Dai, Dengxin and Hoyer, Lukas and Van Gool, Luc and Fink, Olga}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {International Conference on Computer Vision (ICCV 2021)}, ADDRESS = {Virtual}, }
Endnote
%0 Conference Proceedings %A Wang, Qin %A Dai, Dengxin %A Hoyer, Lukas %A Van Gool, Luc %A Fink, Olga %+ External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation : %G eng %U http://hdl.handle.net/21.11116/0000-0009-44AE-7 %D 2021 %B International Conference on Computer Vision %Z date of event: 2021-10-11 - 2021-10-17 %C Virtual %B International Conference on Computer Vision %I IEEE %U https://github.com/qinenergy/corda
Gong, R., Danelljan, M., Dai, D., Wang, W., Paudel, D. P., Chhatkuli, A., … Van Gool, L. (2021). TADA: Taxonomy Adaptive Domain Adaptation. Retrieved from https://arxiv.org/abs/2109.04813
(arXiv: 2109.04813)
Abstract
Traditional domain adaptation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no domain change in the output space. In semantic prediction tasks, different datasets are often labeled according to different semantic taxonomies. In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain. We therefore introduce the more general taxonomy adaptive domain adaptation (TADA) problem, allowing for inconsistent taxonomies between the two domains. We further propose an approach that jointly addresses the image-level and label-level domain adaptation. On the label-level, we employ a bilateral mixed sampling strategy to augment the target domain, and a relabelling method to unify and align the label spaces. We address the image-level domain gap by proposing an uncertainty-rectified contrastive learning method, leading to more domain-invariant and class discriminative features. We extensively evaluate the effectiveness of our framework under different TADA settings: open taxonomy, coarse-to-fine taxonomy, and partially-overlapping taxonomy. Our framework outperforms previous state-of-the-art by a large margin, while capable of adapting to target taxonomies.
Export
BibTeX
@online{Gong2109.04813, TITLE = {{TADA}: {T}axonomy Adaptive Domain Adaptation}, AUTHOR = {Gong, Rui and Danelljan, Martin and Dai, Dengxin and Wang, Wenguan and Paudel, Danda Pani and Chhatkuli, Ajad and Yu, Fisher and Van Gool, Luc}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2109.04813}, EPRINT = {2109.04813}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Traditional domain adaptation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no domain change in the output space. In semantic prediction tasks, different datasets are often labeled according to different semantic taxonomies. In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain. We therefore introduce the more general taxonomy adaptive domain adaptation (TADA) problem, allowing for inconsistent taxonomies between the two domains. We further propose an approach that jointly addresses the image-level and label-level domain adaptation. On the label-level, we employ a bilateral mixed sampling strategy to augment the target domain, and a relabelling method to unify and align the label spaces. We address the image-level domain gap by proposing an uncertainty-rectified contrastive learning method, leading to more domain-invariant and class discriminative features. We extensively evaluate the effectiveness of our framework under different TADA settings: open taxonomy, coarse-to-fine taxonomy, and partially-overlapping taxonomy. Our framework outperforms previous state-of-the-art by a large margin, while capable of adapting to target taxonomies.}, }
Endnote
%0 Report %A Gong, Rui %A Danelljan, Martin %A Dai, Dengxin %A Wang, Wenguan %A Paudel, Danda Pani %A Chhatkuli, Ajad %A Yu, Fisher %A Van Gool, Luc %+ External Organizations External Organizations Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T TADA: Taxonomy Adaptive Domain Adaptation : %G eng %U http://hdl.handle.net/21.11116/0000-0009-89F0-D %U https://arxiv.org/abs/2109.04813 %D 2021 %X Traditional domain adaptation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no domain change in the output space. In semantic prediction tasks, different datasets are often labeled according to different semantic taxonomies. In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain. We therefore introduce the more general taxonomy adaptive domain adaptation (TADA) problem, allowing for inconsistent taxonomies between the two domains. We further propose an approach that jointly addresses the image-level and label-level domain adaptation. On the label-level, we employ a bilateral mixed sampling strategy to augment the target domain, and a relabelling method to unify and align the label spaces. We address the image-level domain gap by proposing an uncertainty-rectified contrastive learning method, leading to more domain-invariant and class discriminative features. We extensively evaluate the effectiveness of our framework under different TADA settings: open taxonomy, coarse-to-fine taxonomy, and partially-overlapping taxonomy. Our framework outperforms previous state-of-the-art by a large margin, while capable of adapting to target taxonomies. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV