Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. In this work we push state of the art in articulated pose estimation in three ways. First, we propose a model that incorporates higher order part dependencies while remaining efficient . We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. Second, we explore various types of appearance representations aiming to substantially improve the body part hypotheses and augment image conditioned model with rich appearance representations. Third, we extend image conditioned model to more flexible spatial model by introducing additional body parts located on joints. These contributions result into a novel powerful pose estimation model  which significantly outperforms state of the art methods on ``Leeds Sports Poses'' and ``Parse'' benchmarks.
Overview of our model is shown below. Starting point for our method is a standard Pictorial Structures (PS) model which combines unimodal part appearance with generic kinematic tree. In order to incorporate higher-order information between body parts, we use a mid-level representation based on poselet detectors (I). Using this representation we directly predict a kinematic tree specifically for each image in question (II) and also predict position and rotation of each body part (III). In order to obtain a strong local multi-modal appearance model we use rotation-dependent part detectors based on Deformable Part Models (DPM) each having multiple components including root and part templates. Both mid-level representations and local appearance model are implemented for a flexible body model consisting of 22 body parts.
 Poselet Conditioned Pictorial Structures. L. Pishchulin, M. Andriluka, P. Gehler and B. Schiele, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June, (2013)
 Strong Appearance and Expressive Spatial Models for Human Pose Estimation, L. Pishchulin, M. Andriluka, P. Gehler and B. Schiele, IEEE International Conference on Computer Vision (ICCV), December, (2013)