D2
Computer Vision and Machine Learning

Anna Khoreva (Post-Doc)

Publications

2023
Intra-Source Style Augmentation for Improved Domain Generalization
Y. Li, D. Zhang, M. Keuper and A. Khoreva
2023 IEEE Winter Conference on Applications of Computer Vision (WACV 2023), 2023
2022
OASIS: Only Adversarial Supervision for Semantic Image Synthesis
V. Sushko, E. Schönfeld, D. Zhang, J. Gall, B. Schiele and A. Khoreva
International Journal of Computer Vision, Volume 130, 2022
Discovering Class-Specific GAN Controls for Semantic Image Synthesis
E. Schönfeld, J. Borges, V. Sushko, B. Schiele and A. Khoreva
Technical Report, 2022
(arXiv: 2212.01455)
Abstract
Prior work has extensively studied the latent space structure of GANs for<br>unconditional image synthesis, enabling global editing of generated images by<br>the unsupervised discovery of interpretable latent directions. However, the<br>discovery of latent directions for conditional GANs for semantic image<br>synthesis (SIS) has remained unexplored. In this work, we specifically focus on<br>addressing this gap. We propose a novel optimization method for finding<br>spatially disentangled class-specific directions in the latent space of<br>pretrained SIS models. We show that the latent directions found by our method<br>can effectively control the local appearance of semantic classes, e.g.,<br>changing their internal structure, texture or color independently from each<br>other. Visual inspection and quantitative evaluation of the discovered GAN<br>controls on various datasets demonstrate that our method discovers a diverse<br>set of unique and semantically meaningful latent directions for class-specific<br>edits.<br>
2021
You Only Need Adversarial Supervision for Semantic Image Synthesis
E. Schönfeld, V. Sushko, D. Zhang, J. Gall, B. Schiele and A. Khoreva
International Conference on Learning Representations (ICLR 2021), 2021
2020
A U-Net Based Discriminator for Generative Adversarial Networks
E. Schönfeld, B. Schiele and A. Khoreva
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020
2019
Lucid Data Dreaming for Video Object Segmentation
A. Khoreva, R. Benenson, E. Ilg, T. Brox and B. Schiele
International Journal of Computer Vision, Volume 127, Number 9, 2019
2018
Video Object Segmentation with Language Referring Expressions
A. Khoreva, A. Rohrbach and B. Schiele
Computer Vision - ACCV 2018, 2018
Learning to Refine Human Pose Estimation
M. Fieraru, A. Khoreva, L. Pishchulin and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2018), 2018
2017
Learning Video Object Segmentation from Static Images
A. Khoreva, F. Perazzi, R. Benenson, B. Schiele and A. Sorkine-Hornung
30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017
Simple Does It: Weakly Supervised Instance and Semantic Segmentation
A. Khoreva, R. Benenson, J. Hosang, M. Hein and B. Schiele
30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017
Exploiting Saliency for Object Segmentation from Image Level Labels
S. J. Oh, R. Benenson, A. Khoreva, Z. Akata, M. Fritz and B. Schiele
30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017
Lucid Data Dreaming for Object Tracking
A. Khoreva, R. Benenson, E. Ilg, T. Brox and B. Schiele
DAVIS Challenge on Video Object Segmentation 2017, 2017
Learning to Segment in Images and Videos with Different Forms of Supervision
A. Khoreva
PhD Thesis, Universität des Saarlandes, 2017
Abstract
Much progress has been made in image and video segmentation<br>over the last years. To a large extent, the success can be attributed to<br>the strong appearance models completely learned from data, in particular<br>using deep learning methods. However,to perform best these methods require<br>large representative datasets for training with expensive pixel-level<br>annotations, which in case of videos are prohibitive to obtain. Therefore,<br>there is a need to relax this constraint and to consider alternative forms<br>of supervision, which are easier and cheaper to collect. In this thesis,<br>we aim to develop algorithms for learning to segment in images and videos<br>with different levels of supervision.<br>First, we develop approaches for training convolutional networks with weaker<br>forms of supervision, such as bounding boxes or image labels, for object<br>boundary estimation and semantic/instance labelling tasks. We propose to<br>generate pixel-level approximate groundtruth from these weaker forms of<br>annotations to train a network, which allows to achieve high-quality<br>results comparable to the full supervision quality without any<br>modifications of the network architecture or the training procedure.<br>Second, we address the problem of the excessive computational and memory<br>costs inherent to solving video segmentation via graphs. We propose<br>approaches to improve the runtime and memory efficiency as well as the<br>output segmentation quality by learning from the available training data<br>the best representation of the graph. In particular, we contribute with<br>learning must-link constraints, the topology and edge weights of the graph<br>as well as enhancing the graph nodes - superpixels - themselves.<br>Third, we tackle the task of pixel-level object tracking and address the<br>problem of the limited amount of densely annotated video data for training<br>convolutional networks. We introduce an architecture which allows training<br>with static images only and propose an elaborate data synthesis scheme<br>which creates a large number of training examples close to the target<br>domain from the given first frame mask. With the proposed techniques we<br>show that densely annotated consequent video data is not necessary to<br>achieve high-quality temporally coherent video segmentationresults.<br>In summary, this thesis advances the state of the art in weakly supervised<br>image segmentation, graph-based video segmentation and pixel-level object<br>tracking and contributes with the new ways of training convolutional<br>networks with a limited amount of pixel-level annotated training data.
Lucid Data Dreaming for Multiple Object Tracking
A. Khoreva, R. Benenson, E. Ilg, T. Brox and B. Schiele
Technical Report, 2017
(arXiv: 1703.09554)
Abstract
Convolutional networks reach top quality in pixel-level object tracking but require a large amount of training data (1k ~ 10k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20x ~ 100x less annotated data than competing methods. Instead of using large training sets hoping to generalize across domains, we generate in-domain training data using the provided annotation on the first frame of each video to synthesize ("lucid dream") plausible future video frames. In-domain per-video training data allows us to train high quality appearance- and motion-based models, as well as tune the post-processing stage. This approach allows to reach competitive results even when training from only a single annotated frame, without ImageNet pre-training. Our results indicate that using a larger training set is not automatically better, and that for the tracking task a smaller training set that is closer to the target domain is more effective. This changes the mindset regarding how many training samples and general "objectness" knowledge are required for the object tracking task.
2016
Weakly Supervised Object Boundaries
A. Khoreva, R. Benenson, M. Omran, M. Hein and B. Schiele
29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 2016
Abstract
State-of-the-art learning based boundary detection methods require extensive training data. Since labelling object boundaries is one of the most expensive types of annotations, there is a need to relax the requirement to carefully annotate images to make both the training more affordable and to extend the amount of training data. In this paper we propose a technique to generate weakly supervised annotations and show that bounding box annotations alone suffice to reach high-quality object boundaries without using any object-specific boundary annotations. With the proposed weak supervision techniques we achieve the top performance on the object boundary detection task, outperforming by a large margin the current fully supervised state-of-the-art methods.
Improved Image Boundaries for Better Video Segmentation
A. Khoreva, R. Benenson, F. Galasso, M. Hein and B. Schiele
Computer Vision -- ECCV 2016 Workshops, 2016
Abstract
Graph-based video segmentation methods rely on superpixels as starting point. While most previous work has focused on the construction of the graph edges and weights as well as solving the graph partitioning problem, this paper focuses on better superpixels for video segmentation. We demonstrate by a comparative analysis that superpixels extracted from boundaries perform best, and show that boundary estimation can be significantly improved via image and time domain cues. With superpixels generated from our better boundaries we observe consistent improvement for two video segmentation methods in two different datasets.
2015
Classifier Based Graph Construction for Video Segmentation
A. Khoreva, F. Galasso, M. Hein and B. Schiele
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), 2015
2014
Learning Must-Link Constraints for Video Segmentation Based on Spectral Clustering
A. Khoreva, F. Galasso, M. Hein and B. Schiele
Pattern Recognition (GCPR 2014), 2014