2017
Gaze Embeddings for Zero-Shot Image Classification
N. Karessli, Z. Akata, B. Schiele and A. Bulling
30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017
(Accepted/in press)
A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-blank Question-answering
T. Maharaj, N. Ballas, A. Rohrbach, A. Courville and C. Pal
30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017
(Accepted/in press)
Generating Descriptions with Grounded and Co-Referenced People
A. Rohrbach, M. Rohrbach, S. Tang, S. J. Oh and B. Schiele
30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017
(Accepted/in press)
Zero-shot learning - The Good, the Bad and the Ugly
Y. Xian, B. Schiele and Z. Akata
30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017
(Accepted/in press)
Noticeable or Distractive? A Design Space for Gaze-Contingent User Interface Notifications
M. Klauck, Y. Sugano and A. Bulling
CHI 2017 Extended Abstracts, 2017
(Accepted/in press)
Visual Stability Prediction for Robotic Manipulation
W. Li, A. Leonardis and M. Fritz
IEEE International Conference on Robotics and Automation (ICRA 2017), 2017
(Accepted/in press)
MARCOnI-ConvNet-Based MARker-Less Motion Capture in Outdoor and Indoor Scenes
A. Elhayek, E. de Aguiar, A. Jain, J. Thompson, L. Pishchulin, M. Andriluka, C. Bregler, B. Schiele and C. Theobalt
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 33, Number 3, 2017
Expanded Parts Model for Semantic Description of Humans in Still Images
G. Sharma, F. Jurie and C. Schmid
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 39, Number 1, 2017
Movie Description
A. Rohrbach, A. Torabi, M. Rohrbach, N. Tandon, C. Pal, H. Larochelle, A. Courville and B. Schiele
International Journal of Computer Vision, Volume First Online, 2017
Abstract
Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015.
Online Growing Neural Gas for Anomaly Detection in Changing Surveillance Scenes
Q. Sun, H. Liu and T. Harada
Pattern Recognition, Volume 64, 2017
Look Together: Using Gaze for Assisting Co-located Collaborative Search
Y. Zhang, K. Pfeuffer, M. K. Chong, J. Alexander, A. Bulling and H. Gellersen
Personal and Ubiquitous Computing, Volume 21, Number 1, 2017
Efficiently Summarising Event Sequences with Rich Interleaving Patterns
A. Bhattacharyya and J. Vreeken
Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), 2017
(Accepted/in press)
Exploiting Saliency for Object Segmentation from Image Level Labels
S. J. Oh, R. Benenson, A. Khoreva, Z. Akata, M. Fritz and B. Schiele
Technical Report, 2017
(arXiv: 1701.08261)
Abstract
There have been remarkable improvements in the semantic labelling task in the recent years. However, the state of the art methods rely on large-scale pixel-level annotations. This paper studies the problem of training a pixel-wise semantic labeller network from image-level annotations of the present object classes. Recently, it has been shown that high quality seeds indicating discriminative object regions can be obtained from image-level labels. Without additional information, obtaining the full extent of the object is an inherently ill-posed problem due to co-occurrences. We propose using a saliency model as additional information and hereby exploit prior knowledge on the object extent and image statistics. We show how to combine both information sources in order to recover 80% of the fully supervised performance - which is the new state of the art in weakly supervised training for pixel-wise semantic labelling.
Efficient Algorithms for Moral Lineage Tracing
M. Rempfler, J.-H. Lange, F. Jug, C. Blasse, E. W. Myers, B. H. Menze and B. Andres
Technical Report, 2017
(arXiv: 1702.04111)
Abstract
Lineage tracing, the joint segmentation and tracking of living cells as they move and divide in a sequence of light microscopy images, is a challenging task. Jug et al. have proposed a mathematical abstraction of this task, the moral lineage tracing problem (MLTP) whose feasible solutions define a segmentation of every image and a lineage forest of cells. Their branch-and-cut algorithm, however, is prone to many cuts and slow convergences for large instances. To address this problem, we make three contributions: Firstly, we improve the branch-and-cut algorithm by separating tighter cutting planes. Secondly, we define two primal feasible local search algorithms for the MLTP. Thirdly, we show in experiments that our algorithms decrease the runtime on the problem instances of Jug et al. considerably and find solutions on larger instances in reasonable time.