Overview

Starting from a single monocular image of multiple individuals, a sparse set of body part detection candidates is computed (I). In order to incorporate various types of interactions between body parts within and across human bodies, a densely connected graph is constructed (II). The problem of multi-person pose estimation is then treated as integer linear program (ILP). Solution results into simultaneous partitioning of part detection candidates into person clusters and labeling each detection by one of the part classes (III), thus computing joint pose estimation of multiple people (IV).

DeeperCut improves over DeepCut on three fronts:

  • deeper ResNet architectures to enhance body part detectors to generate effective bottom-up proposals for body parts
  • novel image-conditioned pairwise terms allow to assemble the proposals into a variable number of consistent body part configurations
  • an incremental optimization strategy explores the search space more efficiently thus leading both to better performance and significant speed-up
  • factors

Qualitative Results

Quantitative Results

DeeperCut significantly outperforms best known multi-person pose estimation results and demonstrates competitive performance on the task of single person pose estimation.

For results and comparisons refer to MPII Human Pose Dataset web page.