2021
Real-time Deep Dynamic Characters
M. Habermann, L. Liu, W. Xu, M. Zollhöfer, G. Pons-Moll and C. Theobalt
ACM Transactions on Graphics (Proc. ACM SIGGRAPH 2021), Volume 40, Number 4, 2021
M. Habermann, L. Liu, W. Xu, M. Zollhöfer, G. Pons-Moll and C. Theobalt
ACM Transactions on Graphics (Proc. ACM SIGGRAPH 2021), Volume 40, Number 4, 2021
Fine-Grained Zero-Shot Learning with DNA as Side Information
S. Badirli, Z. Akata, G. Mohler, C. Picard and M. M. Dundar
Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021
S. Badirli, Z. Akata, G. Mohler, C. Picard and M. M. Dundar
Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021
RMM: Reinforced Memory Management for Class-Incremental Learning
Y. Liu, B. Schiele and Q. Sun
Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021
Y. Liu, B. Schiele and Q. Sun
Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021
Shape your Space: A Gaussian Mixture Regularization Approach to Deterministic Autoencoders
A. Saseendran, K. Skubch, S. Falkner and M. Keuper
Advances in Neural Information Processing Systems 34 Pre-Proceedings (NeurIPS 2021), 2021
A. Saseendran, K. Skubch, S. Falkner and M. Keuper
Advances in Neural Information Processing Systems 34 Pre-Proceedings (NeurIPS 2021), 2021
mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets
R. Gong, D. Dai, Y. Chen, W. Li and L. Van Gool
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
R. Gong, D. Dai, Y. Chen, W. Li and L. Van Gool
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather
M. Hahner, C. Sakaridis, D. Dai and L. Van Gool
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
M. Hahner, C. Sakaridis, D. Dai and L. Van Gool
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths
A. Horňáková, T. Kaiser, P. Swoboda, M. Rolinek, B. Rosenhahn and R. Henschel
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
A. Horňáková, T. Kaiser, P. Swoboda, M. Rolinek, B. Rosenhahn and R. Henschel
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
M. Kayser, O.-M. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata and T. Lukasiewicz
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
M. Kayser, O.-M. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata and T. Lukasiewicz
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Keep CALM and Improve Visual Feature Attribution
J. M. Kim, J. Choe, Z. Akata and S. J. Oh
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
J. M. Kim, J. Choe, Z. Akata and S. J. Oh
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting
A. Kukleva, H. Kuehne and B. Schiele
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
A. Kukleva, H. Kuehne and B. Schiele
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Seeking Similarities over Differences: Similarity-based Domain Alignment for Adaptive Object Detection
F. Rezaeianaran, R. Shetty, R. Aljundi, D. O. Reino, S. Zhang and B. Schiele
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
F. Rezaeianaran, R. Shetty, R. Aljundi, D. O. Reino, S. Zhang and B. Schiele
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding
C. Sakaridis, D. Dai and L. Van Gool
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
C. Sakaridis, D. Dai and L. Van Gool
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Relating Adversarially Robust Generalization to Flat Minima
D. Stutz, M. Hein and B. Schiele
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
D. Stutz, M. Hein and B. Schiele
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Task Switching Network for Multi-task Learning
G. Sun, T. Probst, D. P. Paudel, N. Popovic, M. Kanakis, J. Patel, D. Dai and L. Van Gool
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
G. Sun, T. Probst, D. P. Paudel, N. Popovic, M. Kanakis, J. Patel, D. Dai and L. Van Gool
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing
G. Tiwari, N. Sarafianos, T. Tung and G. Pons-Moll
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
G. Tiwari, N. Sarafianos, T. Tung and G. Pons-Moll
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation
Q. Wang, D. Dai, L. Hoyer, L. Van Gool and O. Fink
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Q. Wang, D. Dai, L. Hoyer, L. Van Gool and O. Fink
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data
N. Yu, V. Skripniuk, S. Abdelnabi and M. Fritz
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
N. Yu, V. Skripniuk, S. Abdelnabi and M. Fritz
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Dual Contrastive Loss and Attention for GANs
N. Yu, G. Liu, A. Dundar, A. Tao, B. Catanzaro, L. Davis and M. Fritz
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
N. Yu, G. Liu, A. Dundar, A. Tao, B. Catanzaro, L. Davis and M. Fritz
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
Z. Zhang, A. Liniger, D. Dai, F. Yu and L. Van Gool
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Z. Zhang, A. Liniger, D. Dai, F. Yu and L. Van Gool
ICCV 2021, IEEE/CVF International Conference on Computer Vision, 2021
Learning Decision Trees Recurrently Through Communication
S. Alaniz, D. Marcos, B. Schiele and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
S. Alaniz, D. Marcos, B. Schiele and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers
A. Bhattacharyya, D. O. Reino, M. Fritz and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
A. Bhattacharyya, D. O. Reino, M. Fritz and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Convolutional Dynamic Alignment Networks for Interpretable Classifications
M. D. Böhle, M. Fritz and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
M. D. Böhle, M. Fritz and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Y. Chen, Y. Xian, A. S. Koepke and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Y. Chen, Y. Xian, A. S. Koepke and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Stereo Radiance Fields (SRF): Learning View Synthesis from Sparse Views of Novel Scenes
J. Chibane, A. Bansal, V. Lazova and G. Pons-Moll
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
J. Chibane, A. Bansal, V. Lazova and G. Pons-Moll
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Learning Spatially-Variant MAP Models for Non-blind Image Deblurring
J. Dong, S. Roth and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
J. Dong, S. Roth and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors
V. Guzov, A. Mir, T. Sattler, and G. Pons-Moll
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
V. Guzov, A. Mir, T. Sattler, and G. Pons-Moll
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Adaptive Aggregation Networks for Class-Incremental Learning
Y. Liu, B. Schiele and Q. Sun
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Y. Liu, B. Schiele and Q. Sun
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Open World Compositional Zero-Shot Learning
M. Mancini, M. F. Naeem, Y. Xian and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
M. Mancini, M. F. Naeem, Y. Xian and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Learning Graph Embeddings for Compositional Zero-shot Learning
M. F. Naeem, Y. Xian, F. Tombari and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
M. F. Naeem, Y. Xian, F. Tombari and Z. Akata
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
D-NeRF: Neural Radiance Fields for Dynamic Scenes
A. Pumarola, E. Corona, G. Pons-Moll and F. Moreno-Noguer
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
A. Pumarola, E. Corona, G. Pons-Moll and F. Moreno-Noguer
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021
Future Moment Assessment for Action Query
Q. Ke, M. Fritz and B. Schiele
IEEE Winter Conference on Applications of Computer Vision (WACV 2021), 2021
Q. Ke, M. Fritz and B. Schiele
IEEE Winter Conference on Applications of Computer Vision (WACV 2021), 2021
Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences
R. G. VidalMata, W. J. Scheirer, A. Kukleva, D. Cox and H. Kuehne
IEEE Winter Conference on Applications of Computer Vision (WACV 2021), 2021
R. G. VidalMata, W. J. Scheirer, A. Kukleva, D. Cox and H. Kuehne
IEEE Winter Conference on Applications of Computer Vision (WACV 2021), 2021
You Only Need Adversarial Supervision for Semantic Image Synthesis
E. Schönfeld, V. Sushko, D. Zhang, J. Gall, B. Schiele and A. Khoreva
International Conference on Learning Representations (ICLR 2021), 2021
E. Schönfeld, V. Sushko, D. Zhang, J. Gall, B. Schiele and A. Khoreva
International Conference on Learning Representations (ICLR 2021), 2021
Semantic Bottlenecks: Quantifying and Improving Inspectability of Deep Representations
M. Losch, M. Fritz and B. Schiele
International Journal of Computer Vision, Volume 129, 2021
M. Losch, M. Fritz and B. Schiele
International Journal of Computer Vision, Volume 129, 2021
SampleFix: Learning to Correct Programs by Sampling Diverse Fixes
H. Hajipour, A. Bhattacharyya, C.-A. Staicu and M. Fritz
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021), 2021
H. Hajipour, A. Bhattacharyya, C.-A. Staicu and M. Fritz
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021), 2021
Internalized Biases in Fréchet Inception Distance
S. Jung and M. Keuper
NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications (NeurIPS 2021 Workshop DistShift), 2021
S. Jung and M. Keuper
NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications (NeurIPS 2021 Workshop DistShift), 2021
Efficient Message Passing for 0–1 ILPs with Binary Decision Diagrams
J.-H. Lange and P. Swoboda
Proceedings of the 38th International Conference on Machine Learning (ICML 2021), 2021
J.-H. Lange and P. Swoboda
Proceedings of the 38th International Conference on Machine Learning (ICML 2021), 2021
Bit Error Robustness for Energy-Efficient DNN Accelerators
D. Stutz, N. Chandramoorthy, M. Hein and B. Schiele
Proceedings of the 4th MLSys Conference, 2021
D. Stutz, N. Chandramoorthy, M. Hein and B. Schiele
Proceedings of the 4th MLSys Conference, 2021
Abstract
Deep neural network (DNN) accelerators received considerable attention in<br>past years due to saved energy compared to mainstream hardware. Low-voltage<br>operation of DNN accelerators allows to further reduce energy consumption<br>significantly, however, causes bit-level failures in the memory storing the<br>quantized DNN weights. In this paper, we show that a combination of robust<br>fixed-point quantization, weight clipping, and random bit error training<br>(RandBET) improves robustness against random bit errors in (quantized) DNN<br>weights significantly. This leads to high energy savings from both low-voltage<br>operation as well as low-precision quantization. Our approach generalizes<br>across operating voltages and accelerators, as demonstrated on bit errors from<br>profiled SRAM arrays. We also discuss why weight clipping alone is already a<br>quite effective way to achieve robustness against bit errors. Moreover, we<br>specifically discuss the involved trade-offs regarding accuracy, robustness and<br>precision: Without losing more than 1% in accuracy compared to a normally<br>trained 8-bit DNN, we can reduce energy consumption on CIFAR-10 by 20%. Higher<br>energy savings of, e.g., 30%, are possible at the cost of 2.5% accuracy, even<br>for 4-bit DNNs.<br>
A Closer Look at Self-training for Zero-Label Semantic Segmentation
G. Pastore, F. Cermelli, Y. Xian, M. Mancini, Z. Akata and B. Caputo
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR 2021), 2021
G. Pastore, F. Cermelli, Y. Xian, M. Mancini, Z. Akata and B. Caputo
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR 2021), 2021
InfoScrub: Towards Attribute Privacy by Targeted Obfuscation
H.-P. Wang, T. Orekondy and M. Fritz
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR 2021), 2021
H.-P. Wang, T. Orekondy and M. Fritz
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR 2021), 2021
Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis
Y. He, N. Yu, M. Keuper and M. Fritz
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI 2021), 2021
Y. He, N. Yu, M. Keuper and M. Fritz
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI 2021), 2021
FastDOG: Fast Discrete Optimization on GPU
A. Abbas and P. Swoboda
Technical Report, 2021
(arXiv: 2111.10270) A. Abbas and P. Swoboda
Technical Report, 2021
Abstract
We present a massively parallel Lagrange decomposition method for solving 0-1<br>integer linear programs occurring in structured prediction. We propose a new<br>iterative update scheme for solving the Lagrangean dual and a perturbation<br>technique for decoding primal solutions. For representing subproblems we follow<br>Lange et al. (2021) and use binary decision diagrams (BDDs). Our primal and<br>dual algorithms require little synchronization between subproblems and<br>optimization over BDDs needs only elementary operations without complicated<br>control flow. This allows us to exploit the parallelism offered by GPUs for all<br>components of our method. We present experimental results on combinatorial<br>problems from MAP inference for Markov Random Fields, quadratic assignment and<br>cell tracking for developmental biology. Our highly parallel GPU implementation<br>improves upon the running times of the algorithms from Lange et al. (2021) by<br>up to an order of magnitude. In particular, we come close to or outperform some<br>state-of-the-art specialized heuristics while being problem agnostic.<br>
Long-term future prediction under uncertainty and multi-modality
A. Bhattacharyya
PhD Thesis, Universität des Saarlandes, 2021
A. Bhattacharyya
PhD Thesis, Universität des Saarlandes, 2021
Where and When: Space-Time Attention for Audio-Visual Explanations
Y. Chen, T. Hummel, A. S. Koepke and Z. Akata
Technical Report, 2021
(arXiv: 2105.01517) Y. Chen, T. Hummel, A. S. Koepke and Z. Akata
Technical Report, 2021
Abstract
Explaining the decision of a multi-modal decision-maker requires to determine<br>the evidence from both modalities. Recent advances in XAI provide explanations<br>for models trained on still images. However, when it comes to modeling multiple<br>sensory modalities in a dynamic world, it remains underexplored how to<br>demystify the mysterious dynamics of a complex multi-modal model. In this work,<br>we take a crucial step forward and explore learnable explanations for<br>audio-visual recognition. Specifically, we propose a novel space-time attention<br>network that uncovers the synergistic dynamics of audio and visual data over<br>both space and time. Our model is capable of predicting the audio-visual video<br>events, while justifying its decision by localizing where the relevant visual<br>cues appear, and when the predicted sounds occur in videos. We benchmark our<br>model on three audio-visual video event datasets, comparing extensively to<br>multiple recent multi-modal representation learners and intrinsic explanation<br>models. Experimental results demonstrate the clear superior performance of our<br>model over the existing methods on audio-visual video event recognition.<br>Moreover, we conduct an in-depth study to analyze the explainability of our<br>model based on robustness analysis via perturbation tests and pointing games<br>using human annotations.<br>
TADA: Taxonomy Adaptive Domain Adaptation
R. Gong, M. Danelljan, D. Dai, W. Wang, D. P. Paudel, A. Chhatkuli, F. Yu and L. Van Gool
Technical Report, 2021
(arXiv: 2109.04813) R. Gong, M. Danelljan, D. Dai, W. Wang, D. P. Paudel, A. Chhatkuli, F. Yu and L. Van Gool
Technical Report, 2021
Abstract
Traditional domain adaptation addresses the task of adapting a model to a<br>novel target domain under limited or no additional supervision. While tackling<br>the input domain gap, the standard domain adaptation settings assume no domain<br>change in the output space. In semantic prediction tasks, different datasets<br>are often labeled according to different semantic taxonomies. In many<br>real-world settings, the target domain task requires a different taxonomy than<br>the one imposed by the source domain. We therefore introduce the more general<br>taxonomy adaptive domain adaptation (TADA) problem, allowing for inconsistent<br>taxonomies between the two domains. We further propose an approach that jointly<br>addresses the image-level and label-level domain adaptation. On the<br>label-level, we employ a bilateral mixed sampling strategy to augment the<br>target domain, and a relabelling method to unify and align the label spaces. We<br>address the image-level domain gap by proposing an uncertainty-rectified<br>contrastive learning method, leading to more domain-invariant and class<br>discriminative features. We extensively evaluate the effectiveness of our<br>framework under different TADA settings: open taxonomy, coarse-to-fine<br>taxonomy, and partially-overlapping taxonomy. Our framework outperforms<br>previous state-of-the-art by a large margin, while capable of adapting to<br>target taxonomies.<br>
Learning Graph Embeddings for Open World Compositional Zero-Shot Learning
M. Mancini, M. F. Naeem, Y. Xian and Z. Akata
Technical Report, 2021
(arXiv: 2105.01017) M. Mancini, M. F. Naeem, Y. Xian and Z. Akata
Technical Report, 2021
Abstract
Compositional Zero-Shot learning (CZSL) aims to recognize unseen compositions<br>of state and object visual primitives seen during training. A problem with<br>standard CZSL is the assumption of knowing which unseen compositions will be<br>available at test time. In this work, we overcome this assumption operating on<br>the open world setting, where no limit is imposed on the compositional space at<br>test time, and the search space contains a large number of unseen compositions.<br>To address this problem, we propose a new approach, Compositional Cosine Graph<br>Embeddings (Co-CGE), based on two principles. First, Co-CGE models the<br>dependency between states, objects and their compositions through a graph<br>convolutional neural network. The graph propagates information from seen to<br>unseen concepts, improving their representations. Second, since not all unseen<br>compositions are equally feasible, and less feasible ones may damage the<br>learned representations, Co-CGE estimates a feasibility score for each unseen<br>composition, using the scores as margins in a cosine similarity-based loss and<br>as weights in the adjacency matrix of the graphs. Experiments show that our<br>approach achieves state-of-the-art performances in standard CZSL while<br>outperforming previous methods in the open world scenario.<br>
From Pixels to People
M. Omran
PhD Thesis, Universität des Saarlandes, 2021
M. Omran
PhD Thesis, Universität des Saarlandes, 2021
Abstract
Abstract<br>Humans are at the centre of a significant amount of research in computer vision.<br>Endowing machines with the ability to perceive people from visual data is an immense<br>scientific challenge with a high degree of direct practical relevance. Success in automatic<br>perception can be measured at different levels of abstraction, and this will depend on<br>which intelligent behaviour we are trying to replicate: the ability to localise persons in<br>an image or in the environment, understanding how persons are moving at the skeleton<br>and at the surface level, interpreting their interactions with the environment including<br>with other people, and perhaps even anticipating future actions. In this thesis we tackle<br>different sub-problems of the broad research area referred to as "looking at people",<br>aiming to perceive humans in images at different levels of granularity.<br>We start with bounding box-level pedestrian detection: We present a retrospective<br>analysis of methods published in the decade preceding our work, identifying various<br>strands of research that have advanced the state of the art. With quantitative exper-<br>iments, we demonstrate the critical role of developing better feature representations<br>and having the right training distribution. We then contribute two methods based<br>on the insights derived from our analysis: one that combines the strongest aspects of<br>past detectors and another that focuses purely on learning representations. The latter<br>method outperforms more complicated approaches, especially those based on hand-<br>crafted features. We conclude our work on pedestrian detection with a forward-looking<br>analysis that maps out potential avenues for future research.<br>We then turn to pixel-level methods: Perceiving humans requires us to both separate<br>them precisely from the background and identify their surroundings. To this end, we<br>introduce Cityscapes, a large-scale dataset for street scene understanding. This has since<br>established itself as a go-to benchmark for segmentation and detection. We additionally<br>develop methods that relax the requirement for expensive pixel-level annotations, focusing<br>on the task of boundary detection, i.e. identifying the outlines of relevant objects and<br>surfaces. Next, we make the jump from pixels to 3D surfaces, from localising and<br>labelling to fine-grained spatial understanding. We contribute a method for recovering<br>3D human shape and pose, which marries the advantages of learning-based and model-<br>based approaches.<br>We conclude the thesis with a detailed discussion of benchmarking practices in<br>computer vision. Among other things, we argue that the design of future datasets<br>should be driven by the general goal of combinatorial robustness besides task-specific<br>considerations.
Adversarial Content Manipulation for Analyzing and Improving Model Robustness
R. Shetty
PhD Thesis, Universität des Saarlandes, 2021
R. Shetty
PhD Thesis, Universität des Saarlandes, 2021
Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes
K. Zhou, B. L. Bhatnagar, B. Schiele and G. Pons-Moll
Technical Report, 2021
(arXiv: 2102.01161) K. Zhou, B. L. Bhatnagar, B. Schiele and G. Pons-Moll
Technical Report, 2021
Abstract
Most learning methods for 3D data (point clouds, meshes) suffer significant<br>performance drops when the data is not carefully aligned to a canonical<br>orientation. Aligning real world 3D data collected from different sources is<br>non-trivial and requires manual intervention. In this paper, we propose the<br>Adjoint Rigid Transform (ART) Network, a neural module which can be integrated<br>with a variety of 3D networks to significantly boost their performance. ART<br>learns to rotate input shapes to a learned canonical orientation, which is<br>crucial for a lot of tasks such as shape reconstruction, interpolation,<br>non-rigid registration, and latent disentanglement. ART achieves this with<br>self-supervision and a rotation equivariance constraint on predicted rotations.<br>The remarkable result is that with only self-supervision, ART facilitates<br>learning a unique canonical orientation for both rigid and nonrigid shapes,<br>which leads to a notable boost in performance of aforementioned tasks. We will<br>release our code and pre-trained models for further research.<br>
2020
Body Shape Privacy in Images: Understanding Privacy and Preventing Automatic Shape Extraction
H. Sattar, K. Krombholz, G. Pons-Moll and M. Fritz
Computer Vision -- ECCV Workshops 2020, 2020
H. Sattar, K. Krombholz, G. Pons-Moll and M. Fritz
Computer Vision -- ECCV Workshops 2020, 2020
Abstract
Modern approaches to pose and body shape estimation have recently achieved<br>strong performance even under challenging real-world conditions. Even from a<br>single image of a clothed person, a realistic looking body shape can be<br>inferred that captures a users' weight group and body shape type well. This<br>opens up a whole spectrum of applications -- in particular in fashion -- where<br>virtual try-on and recommendation systems can make use of these new and<br>automatized cues. However, a realistic depiction of the undressed body is<br>regarded highly private and therefore might not be consented by most people.<br>Hence, we ask if the automatic extraction of such information can be<br>effectively evaded. While adversarial perturbations have been shown to be<br>effective for manipulating the output of machine learning models -- in<br>particular, end-to-end deep learning approaches -- state of the art shape<br>estimation methods are composed of multiple stages. We perform the first<br>investigation of different strategies that can be used to effectively<br>manipulate the automatic shape estimation while preserving the overall<br>appearance of the original image.<br>