A Hierarchical Representation for Future Action Prediction

Abstract

We consider inferring the future actions of people from a still image or a short video clip. Predicting future actions before they are actually executed is a critical ingredient for enabling us to effectively interact with other humans on a daily basis. However, challenges are two fold: First, we need to capture the subtle details inherent in human movements that may imply a future action; second, predictions usually should be carried out as quickly as possible in the social world, when limited prior observations are available. In this paper, we propose hierarchical movemes - a new representation to describe human movements at multiple levels of granularities, ranging from atomic movements (e.g. an open arm) to coarser movements that cover a larger temporal extent. We develop a max-margin learning framework for future action prediction, integrating a collection of moveme detectors in a hierarchical way. We validate our method on two publicly available datasets and show that it achieves very promising performance.

Supplementary Video

Publication

Tian Lan, Tsung-Chuan Chen, and Silvio Savarese. A Hierarchical Representation for Future Action Prediction. In European Conference on Computer Vision (ECCV), 2014. pdf, bibtex

Last modified: 07/12/2014.