Research
The goal of my research is to enable robots that can operate in unstructured, real-world environments. Towards this goal I study how robots can generalize effectively across tasks, objects, and environments by learning from large datasets. Specifically, my research focuses on methods for:
- Scalably [1, 2] and safely [3] collecting real world robot datasets.
- Self-supervised learning of visual models from offline data [1,2,3] and using them for robotic manipulation [4,5,6,7].
- Leveraging human video datasets [1,3] and natural language annotations [2,3] to enable better robot learning.
|
Recent Talk (April 2022 @ Nuro)
|
Publications & Preprints
|
R3M: A Universal Visual Representation for Robot Manipulation
Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, Abhinav Gupta
Arxiv Preprint, 2022
project page /
code
We pre-train a generalizable visual representation on diverse human videos and language, and show it enables far more efficient learning across a wide range of robotic manipulation tasks.
|
|
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
Suraj Nair, Eric Mitchell, Kevin Chen, Brian Ichter, Silvio Savarese, Chelsea Finn
Conference on Robot Learning (CoRL), 2021
project page /
code
We learn language-conditioned visuomotor skills on real robots from entirely offline, pre-collected datasets and crowdsourced language annotation.
|
|
Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks
Bohan Wu, Suraj Nair, Li Fei-Fei*, Chelsea Finn*
Conference on Robot Learning (CoRL), 2021
project page
EMBR is a model-based RL algorithm that learns visuomotor skills and their groundings, which can then be sequenced with symbolic planners to complete long-horizon, multi-stage manipulation tasks on real robots.
|
|
FitVid: Overfitting in Pixel-Level Video Prediction
Mohammad Babaeizadeh, Mohammad Taghi Saffar, Suraj Nair, Sergey Levine, Chelsea Finn, Dumitru Erhan
Arxiv Preprint, 2021
project page /
code
We propose a variational video prediction model that is capable of severe overfitting on common video prediction benchmarks while having similar parameter count as the current SOTA models.
|
|
Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos
Annie S. Chen, Suraj Nair, Chelsea Finn
Robotics Science and Systems (RSS), 2021
ICLR Workshop on Self-Supervised Reinforcement Learning, 2021, (Oral)
project page
We propose a technique for learning multi-task reward functions from a small amount of robot data and large amounts of in-the-wild human videos. By leveraging diverse human data, the learned reward function is able to generalize to new environments and tasks.
|
|
Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
Bohan Wu, Suraj Nair, Roberto Martin-Martin, Li Fei-Fei*, Chelsea Finn*
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2021
project page
We propose a technique for video prediction which trains a hierarchy of action-conditioned VAEs in a greedy fashion, enabling efficient training of large video prediction models.
|
|
Model-Based Visual Planning with Self-Supervised Functional Distances
Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
International Conference on Learning Representations (ICLR), 2021 (Spotlight)
project page
We propose a method for offline model-based RL which learns a video prediction model and a Q function based distance metric, and uses them to accomplish visually specified goals.
|
|
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning
Annie S. Chen*, HyunJi Nam*, Suraj Nair*, Chelsea Finn
Robotics and Automation Letters (RA-L) and International Conference on Robotics and Automation (ICRA), 2021
project page /
code
We propose a framework for leveraging weak human superivision to enable better robotic exploration. Using just a few minutes of human supervision, the robot collects high quality data while unsupervised, providing better data for downstream offline RL.
|
|
Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones
Brijen Thananjeyan*, Ashwin Balakrishna*, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, Ken Goldberg
Robotics and Automation Letters (RA-L) and International Conference on Robotics and Automation (ICRA), 2021
project page
An algorithm for safe reinforcement learning which utilizes a set of offline data to learn about constraints before policy learning and a pair of policies which seperate the often conflicting objectives of task directed exploration and constraint satisfaction to learn contact rich and visuomotor control tasks.
|
|
Goal-Aware Prediction: Learning to Model what Matters
Suraj Nair, Silvio Savarese, Chelsea Finn
International Conference on Machine Learning (ICML) , 2020
project page /
code
We explore learning visual dynamics models which are conditioned on goals, and learn to model only goal relevant quantities.
|
|
Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation.
Suraj Nair, Chelsea Finn
International Conference on Learning Representations (ICLR), 2020
project page /
code
We study how we can learn long horizon vision-based tasks in self-supervised settings. Our approach, hierarchical visual foresight, can optimize for a sequence of subgoals that break down the task into easy to complete subsegments.
|
|
Time Reversal as Self-Supervision
Suraj Nair, Mohammad Babaeizadeh, Chelsea Finn, Sergey Levine, Vikash Kumar
International Conference on Robotics and Automation (ICRA) , 2020
project page /
press
We propose a technique that uses time-reversal to learn goals and provide a high level plan to reach them. In particular, our approach explores outward from a set of goal states, "unsolving" a task, which then enables solving the task from new initializations at test time.
|
|
Causal Induction from Visual Observations for Goal-Directed Tasks
Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei
Workshop on Causal Machine Learning NeurIPS, 2019
project page /
code
We explore how to effectively predict causal graphs from a small set of visual observations, and how to encorporate the learned graphs into downstream goal conditioned policy learning.
|
|
RoboNet: Large-Scale Multi-Robot Learning
Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, Chelsea Finn
Conference on Robot Learning (CoRL) , 2019
project page /
code /
press
We collect a dataset of robotic experience across 4 institutions and 7 robots, and demonstrate that robot learning algorithms leveraging this data can adapt to new environments faster than training from scratch.
|
|
Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
De-An Huang*, Suraj Nair*, Danfei Xu*, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2019 (Oral)
NTG learns to produce a task graph from a single video demonstration of an unseen task, and leverages it for one-shot imitation learning.
|
|
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
Danfei Xu*, Suraj Nair*, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese
International Conference on Robotics and Automation (ICRA) , 2018
project page /
code /
Two Minute Papers
Neural Task Programming (NTP) is a meta-learning framework that learns to generate robot-executable neural programs from task demonstration video.
|
|
Reliable RealTime Seismic Signal/Noise Discrimination With Machine Learning
Men-Andrin Meier, Zachary E Ross, Anshul Ramachandran, Ashwin Balakrishna, Suraj Nair, Peter Kundzicz, Zefeng Li, Jennifer Andrews, Egill Hauksson, Yisong Yue.
Journal of Geo-Physical Research: Solid Earth, 2019
Efficient prediction of real local earthquake signals from impulsive signals for earthquake early warning (EEW) alerts.
|
|
Annotated Reconstruction of 3D Spaces Using Drones
Suraj Nair, Anshul Ramachandran, Peter Kundzicz.
MIT Undergraduate Research in Technology Conference (URTC), 2017 (Best Paper Presentation)
Reconstruct 3D voxel representations of a scene with object labels from RGB images captured from a drone, and use it for exporatory motion planning
|
Teaching Assistant: Stanford CS 330 [2019, 2020], Deep Multi-Task and Meta Learning
Teaching Assistant: Caltech CS/EE 155 [2017] , Machine Learning/Data Mining
Teaching Assistant: Caltech CS 121 [2016], Introduction to Relational Databases
Reviewer: NeurIPS, ICML, ICLR, CVPR, ICCV, CoRL, ICRA, IROS
|