Anna Khoreva (Post-Doc)

Personal Information
Research Interests
- Computer Vision
- Machine Learning
Education
- Ph.D. student, Computer Science, Max-Planck-Institut für Informatik, Saarland Informatics Campus, Germany (2014 - present)
- M.Sc., Visual Computing, Saarland University, Saarland Informatics Campus, Germany, 2012-2014
- Dipl.-Math., Applied Mathematics, Ulyanovsk State Technical University, Ulyanovsk, Russia, 2004-2009
Research Projects
- Learning Must-Link Constraints for Video Segmentation
- Classifier Based Graph Construction for Video Segmentation
- Improved Image Boundaries for Better Video Segmentation
- Weakly Supervised Object Boundaries
- Simple Does It: Weakly Supervised Instance and Semantic Segmentation
- Exploiting Saliency for Segmenting Objects from Image-Level Labels
- Learning Video Object Segmentation from Static Images
- Lucid Data Dreaming for Object Tracking
- Video Object Segmentation with Language Referring Expressions
Other
See my Google Scholar profile.
Publications
2020
2019
Lucid Data Dreaming for Video Object Segmentation
A. Khoreva, R. Benenson, E. Ilg, T. Brox and B. Schiele
International Journal of Computer Vision, Volume 127, Number 9, 2019
A. Khoreva, R. Benenson, E. Ilg, T. Brox and B. Schiele
International Journal of Computer Vision, Volume 127, Number 9, 2019
2018
Learning to Refine Human Pose Estimation
M. Fieraru, A. Khoreva, L. Pishchulin and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2018), 2018
M. Fieraru, A. Khoreva, L. Pishchulin and B. Schiele
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2018), 2018
2017
Lucid Data Dreaming for Object Tracking
A. Khoreva, R. Benenson, E. Ilg, T. Brox and B. Schiele
DAVIS Challenge on Video Object Segmentation 2017, 2017
A. Khoreva, R. Benenson, E. Ilg, T. Brox and B. Schiele
DAVIS Challenge on Video Object Segmentation 2017, 2017
Lucid Data Dreaming for Multiple Object Tracking
A. Khoreva, R. Benenson, E. Ilg, T. Brox and B. Schiele
Technical Report, 2017
(arXiv: 1703.09554) A. Khoreva, R. Benenson, E. Ilg, T. Brox and B. Schiele
Technical Report, 2017
Abstract
Convolutional networks reach top quality in pixel-level object tracking but
require a large amount of training data (1k ~ 10k) to deliver such results. We
propose a new training strategy which achieves state-of-the-art results across
three evaluation datasets while using 20x ~ 100x less annotated data than
competing methods. Instead of using large training sets hoping to generalize
across domains, we generate in-domain training data using the provided
annotation on the first frame of each video to synthesize ("lucid dream")
plausible future video frames. In-domain per-video training data allows us to
train high quality appearance- and motion-based models, as well as tune the
post-processing stage. This approach allows to reach competitive results even
when training from only a single annotated frame, without ImageNet
pre-training. Our results indicate that using a larger training set is not
automatically better, and that for the tracking task a smaller training set
that is closer to the target domain is more effective. This changes the mindset
regarding how many training samples and general "objectness" knowledge are
required for the object tracking task.
Learning to Segment in Images and Videos with Different Forms of Supervision
A. Khoreva
PhD Thesis, Universität des Saarlandes, 2017
A. Khoreva
PhD Thesis, Universität des Saarlandes, 2017
Abstract
Much progress has been made in image and video segmentation
over the last years. To a large extent, the success can be attributed to
the strong appearance models completely learned from data, in particular
using deep learning methods. However,to perform best these methods require
large representative datasets for training with expensive pixel-level
annotations, which in case of videos are prohibitive to obtain. Therefore,
there is a need to relax this constraint and to consider alternative forms
of supervision, which are easier and cheaper to collect. In this thesis,
we aim to develop algorithms for learning to segment in images and videos
with different levels of supervision.
First, we develop approaches for training convolutional networks with weaker
forms of supervision, such as bounding boxes or image labels, for object
boundary estimation and semantic/instance labelling tasks. We propose to
generate pixel-level approximate groundtruth from these weaker forms of
annotations to train a network, which allows to achieve high-quality
results comparable to the full supervision quality without any
modifications of the network architecture or the training procedure.
Second, we address the problem of the excessive computational and memory
costs inherent to solving video segmentation via graphs. We propose
approaches to improve the runtime and memory efficiency as well as the
output segmentation quality by learning from the available training data
the best representation of the graph. In particular, we contribute with
learning must-link constraints, the topology and edge weights of the graph
as well as enhancing the graph nodes - superpixels - themselves.
Third, we tackle the task of pixel-level object tracking and address the
problem of the limited amount of densely annotated video data for training
convolutional networks. We introduce an architecture which allows training
with static images only and propose an elaborate data synthesis scheme
which creates a large number of training examples close to the target
domain from the given first frame mask. With the proposed techniques we
show that densely annotated consequent video data is not necessary to
achieve high-quality temporally coherent video segmentationresults.
In summary, this thesis advances the state of the art in weakly supervised
image segmentation, graph-based video segmentation and pixel-level object
tracking and contributes with the new ways of training convolutional
networks with a limited amount of pixel-level annotated training data.
2016
Weakly Supervised Object Boundaries
A. Khoreva, R. Benenson, M. Omran, M. Hein and B. Schiele
29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 2016
A. Khoreva, R. Benenson, M. Omran, M. Hein and B. Schiele
29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 2016
Abstract
State-of-the-art learning based boundary detection methods require extensive
training data. Since labelling object boundaries is one of the most expensive
types of annotations, there is a need to relax the requirement to carefully
annotate images to make both the training more affordable and to extend the
amount of training data. In this paper we propose a technique to generate
weakly supervised annotations and show that bounding box annotations alone
suffice to reach high-quality object boundaries without using any
object-specific boundary annotations. With the proposed weak supervision
techniques we achieve the top performance on the object boundary detection
task, outperforming by a large margin the current fully supervised
state-of-the-art methods.
Improved Image Boundaries for Better Video Segmentation
A. Khoreva, R. Benenson, F. Galasso, M. Hein and B. Schiele
Computer Vision -- ECCV 2016 Workshops, 2016
A. Khoreva, R. Benenson, F. Galasso, M. Hein and B. Schiele
Computer Vision -- ECCV 2016 Workshops, 2016
Abstract
Graph-based video segmentation methods rely on superpixels as starting point.
While most previous work has focused on the construction of the graph edges and
weights as well as solving the graph partitioning problem, this paper focuses
on better superpixels for video segmentation. We demonstrate by a comparative
analysis that superpixels extracted from boundaries perform best, and show that
boundary estimation can be significantly improved via image and time domain
cues. With superpixels generated from our better boundaries we observe
consistent improvement for two video segmentation methods in two different
datasets.
2015
2014