Procedure Planning in Instructional Videos

Procedure Planning in Instructional Videos

Abstract

In this paper, we study the problem of procedure planning in instructional videos, which can be seen as the first step towards enabling autonomous agents to plan for real-life tasks in everyday settings. The key technical challenge of planning in instructional videos is that the state and action spaces are underconstrained. We address this challenge by proposing Dual Dynamics Networks (DDN), a framework that explicitly leverages the constraints imposed by the conjugate relationships between states and actions in a learned plannable latent space. We evaluate our method on large-scale real-world instructional videos. Our experiments show that DDN learns plannable representations without explicit supervision and leads to stronger generalization compared to existing planning approaches and neural network policies.

Publication
European Conference on Computer Vision (ECCV), 2020
Date