Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation
Human pose estimation requires a versatile yet well-constrained
spatial model for grouping locally ambiguous parts together to pro-
duce a globally consistent hypothesis. Previous works either use local
deformable models deviating from a certain template, or use a global
mixture representation in the pose space. In this paper, we propose a
new hierarchical spatial model that can capture an exponential number
of poses with a compact mixture representation on each part. Using la-
tent nodes, it can represent high-order spatial relationship among parts
with exact inference. Different from recent hierarchical models that asso-
ciate each latent node to a mixture of appearance templates (like HoG),
we use the hierarchical structure as a pure spatial prior avoiding the large
and often confounding appearance space. We verify the effectiveness of
this model in three ways. First, samples representing human-like poses
can be drawn from our model, showing its ability to capture high-order
dependencies of parts. Second, our model achieves accurate reconstruc-
tion of unseen poses compared to a nearest neighbor pose representation.
Finally, our model achieves state-of-art performance on three challenging
datasets, and substantially outperforms recent hierarchical models.
Publications
"Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation"
Yuandong Tian, C. Lawrence Zitnick, and Srinivasa G. Narasimhan
Proc. of European Conference on Computer Vision Vision (ECCV),
Oct, 2012.
[PDF]
We have built a hierarichical model for human pose estimation which encodes high-order relationship among parts.
The graphical model
The Undirected graphical model associated with the hierarchical model. Each node consists of a position variable p_j that identifies where the part is, and a type variable z_j. The type variable characterizes how the part looks like and how its child parts are arranged spatially.
Type Compatibility
Using hierarchical model, it is possible to capture the compatibility between the parent and the child type, and thus only reasonable configurations are allowed to have high score, while appearance of parts are shared to reduce the number of parameters in the model.
Samples from the hierarchical model
From this model, it is possible to sample reasonable human
poses. Compared to previous approaches, our samples are more
natural-looking.
Results
PCP performance on three benchmark datasets
Our method achieves state-of-the-art performance on three benchmark datasets, the PARSE dataset, Leeds Sport dataset and UIUC people dataset in terms of PCP (the percentage of parts being correctly detected).
Sample Pose Estimation Results
Some sample pose estimation results on PARSE and Leeds Sports Datasets.