Danfei Xu

I am a Ph.D. student in CS at Stanford University. My advisors are Fei-Fei Li and Silvio Savarese who co-lead the Stanford Vision and Learning Lab. I work in the intersection of robot learning and computer vision. I'm also co-instructing Stanford's CS231n Course on Convolutional Neural Networks for Visual Recognition.

Prior to joining Stanford, I received my B.S. from Columbia University (2015). I've spent time at DeepMind UK (2019), ZOOX (2017), Autodesk Research (2016), CMU RI (2014), and Columbia Robotics Lab (2013-2015).

I'm currently on the faculty job market (2020-2021)!

Email  /  Google Scholar  /  CV (Jan 2021)  /  Github  /  Twitter

Research

My research tackles long-standing problems in robotics such as learning from demonstrations and task planning by drawing from robotics, computer vision, and structured learning approaches. The key research questions that I seek to answer is: How can we enable robot learners to achieve compositional generalization using structured computations and representations. Some examples are: Generalizable visual imitation learning using neural program induction and neural graph inference, compositional plannning with neural-symbolic planners, and learning abstract planning representations.

I also lead a line of research in structured scene understanding in 2D and 3D. Examples are scene graph generation, 3D reconstructions from monocular views, and 2D-3D sensor fusion in detection and tracking.

News
Demos

Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations (2020)

6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints (2020)
Preprints
Deep Affordance Foresight: Planning Through What Can Be Done in the Future
Danfei Xu, Ajay Mandlekar, Roberto Martin-Martin, Yuke Zhu, Silvio Savarese, Li Fei-Fei
(Long version) In submission
(Short version) Oral Presentation, NeurIPS Workshop on Object Representations for Learning and Reasoning, 2020

We extend the classical definition of affordance to enable generalizable long-horizon planning.

Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control
Chen Wang*, Rui Wang*, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Danfei Xu
In Submission

An learnable action space for recovering human's hand-eye coordination behaviors by learning from human demonstrations.

Publications
Positive-Unlabeled Reward Learning
Danfei Xu, Misha Denil
(Long version) CoRL 2020
(Short version) Late-Breaking Paper, NeurIPS Deep Reinforcement Learning Workshop 2019

[Video]

An algorithm framework that simultaneously addresses the reward delusion problem in supervised reward learning and the overfitting discriminator problem in adversarial imitation learning.

Procedure Planning in Instructional Videos
Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei
Juan Carlos Niebles
ECCV, 2020

Learning to plan from instructional videos.

Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations
Ajay Mandlekar*, Danfei Xu*, Roberto Martin-Martin, Silvio Savarese, Li Fei-Fei
RSS, 2020

[website] [video] [blog post]

Learning visuomotor policies that can generalize across long-horizon tasks by modeling latent compositional structures.

6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints
Chen Wang, Roberto Martin-Martin, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu
ICRA, 2020

[website] [video] [code]

Real-time category-level 6D object tracking from RGB-D data.

Regression Planning Networks
Danfei Xu, Roberto Martin-Martin, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei
NeurIPS, 2019

[code] [poster]

A flexible neural network architecture for learning to plan from video demonstrations.

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning
De-An Huang, Danfei Xu, Yuke Zhu, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles
IROS, 2019

[blog post]

One-shot imitation learning via hybrid neural-symbolic planning.

Situational Fusion of Visual Representation for Visual Navigation
William B. Shen, Danfei Xu, Yuke Zhu, Leonidas Guibas, Li Fei-Fei, Silvio Savarese
ICCV, 2019

Learning generalizable navigation policy from mid-level visual representations.

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martin-Martin, Cewu Lu, Li Fei-Fei, Silvio Savarese
CVPR, 2019

[website] [video] [code]

Dense RGB-depth sensor fusion for 6D object pose estimation.

Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
De-An Huang*, Suraj Nair*, Danfei Xu*, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles
CVPR, 2019 (Oral)

[blog post]

Generate executable task graphs from video demonstrations.

Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
Danfei Xu*, Suraj Nair*, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese
ICRA, 2018

[website] [video] [Two Minute Papers] [blog post]

Neural Task Programming (NTP) is a meta-learning framework that learns to generate robot-executable neural programs from task demonstration video.

PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation
Danfei Xu, Ashesh Jain, Dragomir Anguelov
CVPR, 2018

End-to-end 3D Bounding Box Estimation via sensor fusion.

Scene Graph Generation by Iterative Message Passing
Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei
CVPR, 2017

[website] [code]

We propose an end-to-end model that jointly infers object category, location, and relationships. The model learns to iteratively improve its prediction by passing messages on a scene graph.

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Christopher B. Choy, Danfei Xu*, JunYoung Gwak*, Silvio Savarese
ECCV, 2016

[website] [code]

We propose an end-to-end 3D reconstruction model that unifies single- and multi-view reconstruction.

Model-Driven Feed-Forward Prediction for Manipulation of Deformable Objects
Yinxiao Li , Yan Wang , Yonghao Yue , Danfei Xu, Michael Case , Shih-Fu Chang , Eitan Grinspun , Peter K. Allen
IEEE TASE, 2016

[website]

Deformable object manipulation with an application of personal assitive robot.

This is the journal paper of our "laundry robot" series:
ICRA 2015
IROS 2015
ICRA 2016

Topometric localization on a road network
Danfei Xu, Hernan Badino, Daniel Huber
IROS, 2015

Vision-based localization on a probabilistic road network.

Tactile identification of objects using Bayesian exploration
Danfei Xu, Gerald E. Loeb, Jeremy Fishel
ICRA, 2013

Object classification using multi-modal tactile sensing.

Teaching
  • [2020] Stanford CS 231n instructor
  • [2019] Stanford CS 231n teaching assistant & lecturer
  • [2018] Stanford CS 231n teaching assistant
  • [2018] Stanford CS 231a teaching assistant
Other Services
  • Reviewer: CVPR, ICCV, ECCV, IROS, ICRA, CoRL, AAAI, IJRR, TPAMI, RA-L, NeurIPS, ICLR, ICML

Template source