Danfei Xu

[New] I will start as an Assistant Professor at the School of Interactive Computing at Georgia Tech in Fall 2022 and will be hiring in the upcoming 2021/2022 cycle. Reach out if you are interested in working with me!

I recently graduated with a Ph.D. in CS from Stanford and will soon start as a Research Scientist at NVIDIA Research. I was fortunate to be advised by Fei-Fei Li and Silvio Savarese who co-lead the Stanford Vision and Learning Lab. I work in the intersection of robotics and machine learning.

Prior to joining Stanford, I received my B.S. from Columbia University (SEAS'15). I've spent time at DeepMind UK (2019), ZOOX (2017), Autodesk Research (2016), CMU RI (2014), and Columbia Robotics Lab (2013-2015).

Email  /  Google Scholar  /  CV (May 2021)  /  Github  /  Twitter

Research

My research is in Robot Learning, which comprises of problems in Robotics, Machine Learning & Computer Vision. My research goal is to build machines that have human-like abilities to generalize to new tasks and environments. My Ph.D. works sought to endow robots with compositional generalization capabilities and solve long-horizon manipulation tasks in complex, ecological environments. Some examples are: generalizable visual imitation learning using neural program induction and neural graph inference, compositional plannning with neural-symbolic planners, learning abstract planning spaces, and discovering compositional representations from unstructured demonstrations.

I also lead a line of research in structured scene understanding in 2D and 3D. Examples are scene graph generation, 3D reconstructions from monocular views, and 2D-3D sensor fusion in detection and tracking.

News
  • [Sep 2021] Our RoboMimic work will appear as an oral presentation at CoRL 2021 (top 6% submissions).
  • [Sep 2021] Our new work on training robot collaborators from demonstrations is accepted at CoRL 2021.
  • [Aug 2021] We released a large-scale study on learning skills from human demos (code+dataset).
  • [Aug 2021] New work on learning human-robot collaboration policies from demonstrations.
  • [July 2021] Generalization Through Hand-Eye Coordination accepted to IROS 2021.
  • [April 2021] We are organizing an ICCV workshop on Structural and Compositional Learning on 3D Data.
  • [April 2021] I will be co-instructing the Stanford CS231n course in Spring 2021.
  • [Mar 2021] Deep Affordance Foresight accepted at ICRA 2021.
  • [Feb 2021] Invited talk at MIT Vision Seminar.
  • [Dec 2020] Invited talk at DeepMind.
  • [Dec 2020] Invited talk at Cornell Robotics Seminar.
Demos

Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations (2020)

6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints (2020)
Preprints
Human-in-the-Loop Imitation Learning using Remote Teleoperation
Ajay Mandlekar, Danfei Xu, Roberto Martin-Martin, Yuke Zhu, Li Fei-Fei, Silvio Savarese
In Submission

Human-in-the-loop learning for complex manipulation tasks.

Publications
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, Roberto Martin-Martin
CoRL 2021 (oral, to appear)

[code+dataset][website][blogpost]

A large-scale study on learning manipulation skills from human demonstrations.

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration
Chen Wang, Claudia D'Arpino, Danfei Xu, Li Fei-Fei, Karen Liu, Silvio Savarese
CoRL 2021 (to appear)

Learning human-robot collaboration policies from human-human collaboration demonstrations.

Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control
Chen Wang*, Rui Wang*, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Danfei Xu
IROS 2021

An learnable action space for recovering human's hand-eye coordination behaviors by learning from human demonstrations.

Deep Affordance Foresight: Planning Through What Can Be Done in the Future
Danfei Xu, Ajay Mandlekar, Roberto Martin-Martin, Yuke Zhu, Silvio Savarese, Li Fei-Fei
(Long version) ICRA 2021
(Short version) Oral Presentation, NeurIPS Workshop on Object Representations for Learning and Reasoning, 2020

We extend the classical definition of affordance to enable generalizable long-horizon planning.

Positive-Unlabeled Reward Learning
Danfei Xu, Misha Denil
(Long version) CoRL 2020
(Short version) Late-Breaking Paper, NeurIPS Deep Reinforcement Learning Workshop 2019

[Video]

An algorithm framework that simultaneously addresses the reward delusion problem in supervised reward learning and the overfitting discriminator problem in adversarial imitation learning.

Procedure Planning in Instructional Videos
Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei
Juan Carlos Niebles
ECCV, 2020

Learning to plan from instructional videos.

Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations
Ajay Mandlekar*, Danfei Xu*, Roberto Martin-Martin, Silvio Savarese, Li Fei-Fei
RSS, 2020

[website] [video] [blog post]

Learning visuomotor policies that can generalize across long-horizon tasks by modeling latent compositional structures.

6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints
Chen Wang, Roberto Martin-Martin, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu
ICRA, 2020

[website] [video] [code]

Real-time category-level 6D object tracking from RGB-D data.

Regression Planning Networks
Danfei Xu, Roberto Martin-Martin, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei
NeurIPS, 2019

[code] [poster]

A flexible neural network architecture for learning to plan from video demonstrations.

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning
De-An Huang, Danfei Xu, Yuke Zhu, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles
IROS, 2019

[blog post]

One-shot imitation learning via hybrid neural-symbolic planning.

Situational Fusion of Visual Representation for Visual Navigation
William B. Shen, Danfei Xu, Yuke Zhu, Leonidas Guibas, Li Fei-Fei, Silvio Savarese
ICCV, 2019

Learning generalizable navigation policy from mid-level visual representations.

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martin-Martin, Cewu Lu, Li Fei-Fei, Silvio Savarese
CVPR, 2019

[website] [video] [code]

Dense RGB-depth sensor fusion for 6D object pose estimation.

Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
De-An Huang*, Suraj Nair*, Danfei Xu*, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles
CVPR, 2019 (Oral)

[blog post]

Generate executable task graphs from video demonstrations.

Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
Danfei Xu*, Suraj Nair*, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese
ICRA, 2018

[website] [video] [Two Minute Papers] [blog post]

Neural Task Programming (NTP) is a meta-learning framework that learns to generate robot-executable neural programs from task demonstration video.

PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation
Danfei Xu, Ashesh Jain, Dragomir Anguelov
CVPR, 2018

End-to-end 3D Bounding Box Estimation via sensor fusion.

Scene Graph Generation by Iterative Message Passing
Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei
CVPR, 2017

[website] [code]

We propose an end-to-end model that jointly infers object category, location, and relationships. The model learns to iteratively improve its prediction by passing messages on a scene graph.

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Christopher B. Choy, Danfei Xu*, JunYoung Gwak*, Silvio Savarese
ECCV, 2016

[website] [code]

We propose an end-to-end 3D reconstruction model that unifies single- and multi-view reconstruction.

Model-Driven Feed-Forward Prediction for Manipulation of Deformable Objects
Yinxiao Li , Yan Wang , Yonghao Yue , Danfei Xu, Michael Case , Shih-Fu Chang , Eitan Grinspun , Peter K. Allen
IEEE TASE, 2016

[website]

Deformable object manipulation with an application of personal assitive robot.

This is the journal paper of our "laundry robot" series:
ICRA 2015
IROS 2015
ICRA 2016

Topometric localization on a road network
Danfei Xu, Hernan Badino, Daniel Huber
IROS, 2015

Vision-based localization on a probabilistic road network.

Tactile identification of objects using Bayesian exploration
Danfei Xu, Gerald E. Loeb, Jeremy Fishel
ICRA, 2013

Object classification using multi-modal tactile sensing.

Teaching
  • [2021] Stanford CS 231n instructor
  • [2020] Stanford CS 231n instructor
  • [2019] Stanford CS 231n teaching assistant & lecturer
  • [2018] Stanford CS 231n teaching assistant
  • [2018] Stanford CS 231a teaching assistant
Other Services
  • Reviewer: CVPR, ICCV, ECCV, IROS, ICRA, RSS, CoRL, T-RO, AAAI, IJRR, TPAMI, RA-L, NeurIPS, ICLR, ICML

Template source