Suraj Nair

I work at Physical Intelligence building brains for robots.

I completed my Ph.D. in Computer Science from the Stanford AI Lab, where I was co-advised by Professors Chelsea Finn and Silvio Savarese. My Ph.D. thesis was on Scaling Deep Robotic Learning to Broad Real-World Data. Before that, I completed my Bachelors in Computer Science at the California Institute of Technology (Caltech), where I worked with Yisong Yue on multi-agent reinforcement learning. I have also spent time at the Toyota Research Institute, Facebook AI Research, Google Brain, and GE.

Email  /  CV  /  Research Statement (May 2022)  /  Google Scholar  /  Twitter  /  LinkedIn

profile photo
Research

The goal of my research is to enable robots that can operate in unstructured, real-world environments. Towards this goal I study how robots can generalize effectively across tasks, objects, and environments by learning from large datasets. Specifically, my research focuses on methods for:

  • Scalably [1, 2] and safely [3] collecting real world robot datasets.
  • Self-supervised learning of visual models from offline data [1,2,3] and using them for robotic manipulation [4,5,6,7].
  • Leveraging human video datasets [1,3] and natural language annotations [2,3] to enable better robot learning.

Recent Talks (April 2022 @ Nuro)
Publications & Preprints (Highlighted Papers)
PontTuset Language-Driven Representation Learning for Robotics
Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang
Robotics: Science and Systems (RSS), 2023
Best Paper Award Finalist
project page / code

We present Voltron, a multi-modal foundation model for robotics trained on human videos and language to produce reusable representations and rewards. We train a single model with many downstream capabilities from features for control to expression grounding and reward/intent inference.

PontTuset Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets.
Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn
Robotics: Science and Systems (RSS), 2023
project page / code /

Often, robot data isn't shared across projects. We present a new way that past project data can be used to improve downstream learning. We use a learned model select relevant data from a large dataset of robot interactions, which augments a small set of task demonstrations for use in a behavior cloning algorithm for more efficient learning.

PontTuset R3M: A Universal Visual Representation for Robot Manipulation
Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, Abhinav Gupta
Conference on Robot Learning (CoRL) 2022
ICRA Scaling Robot Learning Workshop 2022, (Best Paper Award)
project page / code

We pre-train a generalizable visual representation on diverse human videos and language, and show it enables far more efficient learning across a wide range of robotic manipulation tasks.

PontTuset Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual Imitation Learning
Maximilian Du*, Olivia Y. Lee*, Suraj Nair, and Chelsea Finn
Robotics: Science and Systems (RSS), 2022
project page / code / press

We propose a method to enable robots to tackle challenging visually occluded manipulation tasks (like extracting keys from a bag), via end-to-end interactive imitation learning from vision and sound. .

PontTuset Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
Suraj Nair, Eric Mitchell, Kevin Chen, Brian Ichter, Silvio Savarese, Chelsea Finn
Conference on Robot Learning (CoRL), 2021
project page / code

We learn language-conditioned visuomotor skills on real robots from entirely offline, pre-collected datasets and crowdsourced language annotation.

PontTuset Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks
Bohan Wu, Suraj Nair, Li Fei-Fei*, Chelsea Finn*
Conference on Robot Learning (CoRL), 2021
project page

EMBR is a model-based RL algorithm that learns visuomotor skills and their groundings, which can then be sequenced with symbolic planners to complete long-horizon, multi-stage manipulation tasks on real robots.

PontTuset FitVid: Overfitting in Pixel-Level Video Prediction
Mohammad Babaeizadeh, Mohammad Taghi Saffar, Suraj Nair, Sergey Levine, Chelsea Finn, Dumitru Erhan
Arxiv Preprint, 2021
project page / code

We propose a variational video prediction model that is capable of severe overfitting on common video prediction benchmarks while having similar parameter count as the current SOTA models.

PontTuset Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos
Annie S. Chen, Suraj Nair, Chelsea Finn
Robotics Science and Systems (RSS), 2021
ICLR Workshop on Self-Supervised Reinforcement Learning, 2021, (Oral)
project page

We propose a technique for learning multi-task reward functions from a small amount of robot data and large amounts of in-the-wild human videos. By leveraging diverse human data, the learned reward function is able to generalize to new environments and tasks.

PontTuset Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
Bohan Wu, Suraj Nair, Roberto Martin-Martin, Li Fei-Fei*, Chelsea Finn*
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2021
project page

We propose a technique for video prediction which trains a hierarchy of action-conditioned VAEs in a greedy fashion, enabling efficient training of large video prediction models.

PontTuset Model-Based Visual Planning with Self-Supervised Functional Distances
Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
International Conference on Learning Representations (ICLR), 2021 (Spotlight)
project page

We propose a method for offline model-based RL which learns a video prediction model and a Q function based distance metric, and uses them to accomplish visually specified goals.

PontTuset Batch Exploration with Examples for Scalable Robotic Reinforcement Learning
Annie S. Chen*, HyunJi Nam*, Suraj Nair*, Chelsea Finn
Robotics and Automation Letters (RA-L) and International Conference on Robotics and Automation (ICRA), 2021
project page / code

We propose a framework for leveraging weak human superivision to enable better robotic exploration. Using just a few minutes of human supervision, the robot collects high quality data while unsupervised, providing better data for downstream offline RL.

PontTuset Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones
Brijen Thananjeyan*, Ashwin Balakrishna*, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, Ken Goldberg
Robotics and Automation Letters (RA-L) and International Conference on Robotics and Automation (ICRA), 2021
project page

An algorithm for safe reinforcement learning which utilizes a set of offline data to learn about constraints before policy learning and a pair of policies which seperate the often conflicting objectives of task directed exploration and constraint satisfaction to learn contact rich and visuomotor control tasks.

PontTuset Goal-Aware Prediction: Learning to Model what Matters
Suraj Nair, Silvio Savarese, Chelsea Finn
International Conference on Machine Learning (ICML) , 2020
project page / code

We explore learning visual dynamics models which are conditioned on goals, and learn to model only goal relevant quantities.

PontTuset Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation.
Suraj Nair, Chelsea Finn
International Conference on Learning Representations (ICLR), 2020
project page / code

We study how we can learn long horizon vision-based tasks in self-supervised settings. Our approach, hierarchical visual foresight, can optimize for a sequence of subgoals that break down the task into easy to complete subsegments.

PontTuset Time Reversal as Self-Supervision
Suraj Nair, Mohammad Babaeizadeh, Chelsea Finn, Sergey Levine, Vikash Kumar
International Conference on Robotics and Automation (ICRA) , 2020
project page / press

We propose a technique that uses time-reversal to learn goals and provide a high level plan to reach them. In particular, our approach explores outward from a set of goal states, "unsolving" a task, which then enables solving the task from new initializations at test time.

PontTuset Causal Induction from Visual Observations for Goal-Directed Tasks
Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei
Workshop on Causal Machine Learning NeurIPS, 2019
project page / code

We explore how to effectively predict causal graphs from a small set of visual observations, and how to encorporate the learned graphs into downstream goal conditioned policy learning.

PontTuset RoboNet: Large-Scale Multi-Robot Learning
Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, Chelsea Finn
Conference on Robot Learning (CoRL) , 2019
project page / code / press

We collect a dataset of robotic experience across 4 institutions and 7 robots, and demonstrate that robot learning algorithms leveraging this data can adapt to new environments faster than training from scratch.

PontTuset Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
De-An Huang*, Suraj Nair*, Danfei Xu*, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2019 (Oral)

NTG learns to produce a task graph from a single video demonstration of an unseen task, and leverages it for one-shot imitation learning.

PontTuset Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
Danfei Xu*, Suraj Nair*, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese
International Conference on Robotics and Automation (ICRA) , 2018
project page / code / Two Minute Papers

Neural Task Programming (NTP) is a meta-learning framework that learns to generate robot-executable neural programs from task demonstration video.

PontTuset Reliable RealTime Seismic Signal/Noise Discrimination With Machine Learning
Men-Andrin Meier, Zachary E Ross, Anshul Ramachandran, Ashwin Balakrishna, Suraj Nair, Peter Kundzicz, Zefeng Li, Jennifer Andrews, Egill Hauksson, Yisong Yue.
Journal of Geo-Physical Research: Solid Earth, 2019

Efficient prediction of real local earthquake signals from impulsive signals for earthquake early warning (EEW) alerts.

PontTuset Annotated Reconstruction of 3D Spaces Using Drones
Suraj Nair, Anshul Ramachandran, Peter Kundzicz.
MIT Undergraduate Research in Technology Conference (URTC), 2017 (Best Paper Presentation)

Reconstruct 3D voxel representations of a scene with object labels from RGB images captured from a drone, and use it for exporatory motion planning

Teaching
Teaching Assistant: Stanford CS 330 [2019, 2020], Deep Multi-Task and Meta Learning
Teaching Assistant: Caltech CS/EE 155 [2017] , Machine Learning/Data Mining
Teaching Assistant: Caltech CS 121 [2016], Introduction to Relational Databases
Service
Reviewer: NeurIPS, ICML, ICLR, CVPR, ICCV, CoRL, ICRA, IROS
Website Template