D2
Computer Vision and Machine Learning
2021
Real-time Deep Dynamic Characters
M. Habermann, L. Liu, W. Xu, M. Zollhöfer, G. Pons-Moll and C. Theobalt
ACM Transactions on Graphics (Proc. ACM SIGGRAPH 2021), Volume 40, Number 4, 2021
Learning Decision Trees Recurrently Through Communication
S. Alaniz, D. Marcos, B. Schiele and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers
A. Bhattacharyya, D. O. Reino, M. Fritz and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Convolutional Dynamic Alignment Networks for Interpretable Classifications
M. D. Böhle, M. Fritz and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Y. Chen, Y. Xian, A. S. Koepke and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Stereo Radiance Fields (SRF): Learning View Synthesis from Sparse Views of Novel Scenes
J. Chibane, A. Bansal, V. Lazova and G. Pons-Moll
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Learning Spatially-Variant MAP Models for Non-blind Image Deblurring
J. Dong, S. Roth and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Adaptive Aggregation Networks for Class-Incremental Learning
Y. Liu, B. Schiele and Q. Sun
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Open World Compositional Zero-Shot Learning
M. Mancini, M. F. Naeem, Y. Xian and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Learning Graph Embeddings for Compositional Zero-shot Learning
M. F. Naeem, Y. Xian, F. Tombari and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
SMPLicit: Topology-aware Generative Model for Clothed People
G. Pons-Moll, F. Moreno-Noguer, E. Corona, A. Pumarola and G. Alenyà
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
D-NeRF: Neural Radiance Fields for Dynamic Scenes
A. Pumarola, E. Corona, G. Pons-Moll and F. Moreno-Noguer
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs
H.-P. Wang, N. Yu and M. Fritz
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Deep Outlier Handling for Image Deblurring
J. Dong and J. Pan
IEEE Transactions on Image Processing, Volume 30, 2021
Generating Face Images With Attributes for Free
Y. Liu, Q. Sun, X. He, A.-A. Liu, Y. Su and T.-S. Chua
IEEE Transactions on Neural Networks and Learning Systems, Volume 32, Number 6, 2021
Future Moment Assessment for Action Query
Q. Ke, M. Fritz and B. Schiele
IEEE Winter Conference on Applications of Computer Vision (WACV 2021), 2021
Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences
R. G. VidalMata, W. J. Scheirer, A. Kukleva, D. Cox and H. Kuehne
IEEE Winter Conference on Applications of Computer Vision (WACV 2021), 2021
EPEM: Efficient Parameter Estimation for Multiple Class Monotone Missing Data
T. Nguyen, D. H. M. Nguyen, H. Nguyen, B. T. Nguyen and B. A. Wade
Information Sciences, Volume 567, 2021
mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets
R. Gong, D. Dai, Y. Chen, W. Li and L. Van Gool
International Conference on Computer Vision (ICCV 2021), 2021
(Accepted/in press)
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather
M. Hahner, C. Sakaridis, D. Dai and L. Van Gool
International Conference on Computer Vision (ICCV 2021), 2021
(Accepted/in press)
ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding
C. Sakaridis, D. Dai and L. Van Gool
International Conference on Computer Vision (ICCV 2021), 2021
(Accepted/in press)
Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation
Q. Wang, D. Dai, L. Hoyer, L. Van Gool and O. Fink
International Conference on Computer Vision (ICCV 2021), 2021
(Accepted/in press)
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
Z. Zhang, A. Liniger, D. Dai, F. Yu and L. Van Gool
International Conference on Computer Vision (ICCV 2021), 2021
(Accepted/in press)
Norm-Aware Embedding for Efficient Person Search and Tracking
D. Chen, S. Zhang, J. Yang and B. Schiele
International Journal of Computer Vision, Volume 129, 2021
Guest Editorial: Special Issue on “Computer Vision for All Seasons: Adverse Weather and Lighting Conditions”
D. Dai, R. T. Tan, V. Patel, J. Matas, B. Schiele and L. Van Gool
International Journal of Computer Vision, 2021
DLOW: Domain Flow and Applications
R. Gong, W. Li, Y. Chen, D. Dai and L. Van Gool
International Journal of Computer Vision, 2021
Semantic Bottlenecks: Quantifying and Improving Inspectability of Deep Representations
M. Losch, M. Fritz and B. Schiele
International Journal of Computer Vision, Volume 129, 2021
Guided Attention in CNNs for Occluded Pedestrian Detection and Re-identification
S. Zhang, D. Chen, J. Yang and B. Schiele
International Journal of Computer Vision, 2021
Efficient Message Passing for 0–1 ILPs with Binary Decision Diagrams
J.-H. Lange and P. Swoboda
Proceedings of the 38th International Conference on Machine Learning (ICML 2021), 2021
Bit Error Robustness for Energy-Efficient DNN Accelerators
D. Stutz, N. Chandramoorthy, M. Hein and B. Schiele
Proceedings of the 4th MLSys Conference, 2021
Abstract
Deep neural network (DNN) accelerators received considerable attention in past years due to saved energy compared to mainstream hardware. Low-voltage operation of DNN accelerators allows to further reduce energy consumption significantly, however, causes bit-level failures in the memory storing the quantized DNN weights. In this paper, we show that a combination of robust fixed-point quantization, weight clipping, and random bit error training (RandBET) improves robustness against random bit errors in (quantized) DNN weights significantly. This leads to high energy savings from both low-voltage operation as well as low-precision quantization. Our approach generalizes across operating voltages and accelerators, as demonstrated on bit errors from profiled SRAM arrays. We also discuss why weight clipping alone is already a quite effective way to achieve robustness against bit errors. Moreover, we specifically discuss the involved trade-offs regarding accuracy, robustness and precision: Without losing more than 1% in accuracy compared to a normally trained 8-bit DNN, we can reduce energy consumption on CIFAR-10 by 20%. Higher energy savings of, e.g., 30%, are possible at the cost of 2.5% accuracy, even for 4-bit DNNs.
Spectral Distribution Aware Image Generation
S. Jung and M. Keuper
Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
Text-image synergy for multimodal retrieval and annotation
S. N. Chowdhury
PhD Thesis, Universität des Saarlandes, 2021
Abstract
Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.
Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds
D. Dai, A. B. Vasudevan, J. Matas and L. Van Gool
Technical Report, 2021
(arXiv: 2109.02763)
Abstract
Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, and the depth map of the scene. To this aim, we propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360-degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of multiple vision teacher methods and a sound student method -- the student method is trained to generate the same results as the teacher methods do. This way, the auditory system can be trained without using human annotations. To further boost the performance, we propose another novel auxiliary task, coined Spatial Sound Super-Resolution, to increase the directional resolution of sounds. We then formulate the four tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results show that 1) our method achieves good results for all four tasks, 2) the four tasks are mutually beneficial -- training them together achieves the best performance, 3) the number and orientation of microphones are both important, and 4) features learned from the standard spectrogram and features obtained by the classic signal processing pipeline are complementary for auditory perception tasks. The data and code are released.
Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation
L. Hoyer, D. Dai, Q. Wang, Y. Chen and L. Van Gool
Technical Report, 2021
(arXiv: 2108.12545)
Abstract
Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained only on unlabeled image sequences. In particular, we utilize SDE as an auxiliary task comprehensively across the entire learning framework: First, we automatically select the most useful samples to be annotated for semantic segmentation based on the correlation of sample diversity and difficulty between SDE and semantic segmentation. Second, we implement a strong data augmentation by mixing images and labels using the geometry of the scene. Third, we transfer knowledge from features learned during SDE to semantic segmentation by means of transfer and multi-task learning. And fourth, we exploit additional labeled synthetic data with Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data. We validate the proposed model on the Cityscapes dataset, where all four contributions demonstrate significant performance gains, and achieve state-of-the-art results for semi-supervised semantic segmentation as well as for semi-supervised domain adaptation. In particular, with only 1/30 of the Cityscapes labels, our method achieves 92% of the fully-supervised baseline performance and even 97% when exploiting additional data from GTA. The source code is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.
Adversarial Content Manipulation for Analyzing and Improving Model Robustness
R. Shetty
PhD Thesis, Universität des Saarlandes, 2021
Learnable Online Graph Representations for 3D Multi-Object Tracking
J.-N. Zaech, D. Dai, A. Liniger, M. Danelljan and L. Van Gool
Technical Report, 2021
(arXiv: 2104.11747)
Abstract
Tracking of objects in 3D is a fundamental task in computer vision that finds use in a wide range of applications such as autonomous driving, robotics or augmented reality. Most recent approaches for 3D multi object tracking (MOT) from LIDAR use object dynamics together with a set of handcrafted features to match detections of objects. However, manually designing such features and heuristics is cumbersome and often leads to suboptimal performance. In this work, we instead strive towards a unified and learning based approach to the 3D MOT problem. We design a graph structure to jointly process detection and track states in an online manner. To this end, we employ a Neural Message Passing network for data association that is fully trainable. Our approach provides a natural way for track initialization and handling of false positive detections, while significantly improving track stability. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
2020
Implicit Feature Networks for Texture Completion from Partial 3D Data
J. Chibane and G. Pons-Moll
Computer Vision -- ECCV Workshops 2020, 2020
Synthetic Convolutional Features for Improved Semantic Segmentation
Y. He, B. Schiele and M. Fritz
Computer Vision -- ECCV Workshops 2020, 2020
Adversarial Training Against Location-Optimized Adversarial Patches
S. Rao, D. Stutz and B. Schiele
Computer Vision -- ECCV Workshops 2020, 2020
SHARP 2020: The 1st Shape Recovery from Partial Textured 3D Scans Challenge Results
A. Saint, A. Kacem, K. Cherenkova, K. Papadopoulos, J. Chibane, G. Pons-Moll, G. Gusev, D. Fofi, D. Aouada and B. Ottersten
Computer Vision -- ECCV Workshops 2020, 2020
Body Shape Privacy in Images: Understanding Privacy and Preventing Automatic Shape Extraction
H. Sattar, K. Krombholz, G. Pons-Moll and M. Fritz
Computer Vision -- ECCV Workshops 2020, 2020
Abstract
Modern approaches to pose and body shape estimation have recently achieved strong performance even under challenging real-world conditions. Even from a single image of a clothed person, a realistic looking body shape can be inferred that captures a users' weight group and body shape type well. This opens up a whole spectrum of applications -- in particular in fashion -- where virtual try-on and recommendation systems can make use of these new and automatized cues. However, a realistic depiction of the undressed body is regarded highly private and therefore might not be consented by most people. Hence, we ask if the automatic extraction of such information can be effectively evaded. While adversarial perturbations have been shown to be effective for manipulating the output of machine learning models -- in particular, end-to-end deep learning approaches -- state of the art shape estimation methods are composed of multiple stages. We perform the first investigation of different strategies that can be used to effectively manipulate the automatic shape estimation while preserving the overall appearance of the original image.
Haar Wavelet based Block Autoregressive Flows for Trajectories
A. Bhattacharyya, C.-N. Straehle, M. Fritz and B. Schiele
Pattern Recognition (GCPR 2020), 2020
Analyzing the Dependency of ConvNets on Spatial Information
Y. Fan, Y. Xian, M. M. Losch and B. Schiele
Pattern Recognition (GCPR 2020), 2020
Long-Term Anticipation of Activities with Cycle Consistency
Y. A. Farha, Q. Ke, B. Schiele and J. Gall
Pattern Recognition (GCPR 2020), 2020
On the Lifted Multicut Polytope for Trees
J.-H. Lange and B. Andres
Pattern Recognition (GCPR 2020), 2020
Semantic Bottlenecks: Quantifying & Improving Inspectability of Deep Representations
M. Losch, M. Fritz and B. Schiele
Pattern Recognition (GCPR 2020), 2020
Long-Tailed Recognition Using Class-Balanced Experts
S. Sharma, N. Yu, M. Fritz and B. Schiele
Pattern Recognition (GCPR 2020), 2020