Procedure Planning in Instructional Videos

Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles

Aug 10, 2019

Arxiv

Procedure Planning in Instructional Videos

Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles

Aug 10, 2019

Arxiv

Abstract

In this paper, we study the problem of procedure planning in instructional videos, which can be seen as the first step towards enabling autonomous agents to plan for real-life tasks in everyday settings. The key technical challenge of planning in instructional videos is that the state and action spaces are underconstrained. We address this challenge by proposing Dual Dynamics Networks (DDN), a framework that explicitly leverages the constraints imposed by the conjugate relationships between states and actions in a learned plannable latent space. We evaluate our method on large-scale real-world instructional videos. Our experiments show that DDN learns plannable representations without explicit supervision and leads to stronger generalization compared to existing planning approaches and neural network policies.

Type

Conference paper

Publication

European Conference on Computer Vision (ECCV), 2020

Date

August, 2019