Suraj Nair

I work at Physical Intelligence building brains for robots.

I completed my Ph.D. in Computer Science from the Stanford AI Lab, where I was co-advised by Professors Chelsea Finn and Silvio Savarese. My Ph.D. thesis was on Scaling Deep Robotic Learning to Broad Real-World Data. Before that, I completed my Bachelors in Computer Science at the California Institute of Technology (Caltech), where I worked with Yisong Yue on multi-agent reinforcement learning. I have also spent time at the Toyota Research Institute, Facebook AI Research, Google Brain, and GE.

Email / CV / Research Statement (May 2022) / Google Scholar / Twitter / LinkedIn

Research

The goal of my research is to enable robots that can operate in unstructured, real-world environments. Towards this goal I study how robots can generalize effectively across tasks, objects, and environments by learning from large datasets. Specifically, my research focuses on methods for:

Scalably [1, 2] and safely [3] collecting real world robot datasets.
Self-supervised learning of visual models from offline data [1,2,3] and using them for robotic manipulation [4,5,6,7].
Leveraging human video datasets [1,3] and natural language annotations [2,3] to enable better robot learning.

Recent Talks (April 2022 @ Nuro)

Publications & Preprints (Highlighted Papers)

	Language-Driven Representation Learning for Robotics Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang Robotics: Science and Systems (RSS), 2023 Best Paper Award Finalist project page / code We present Voltron, a multi-modal foundation model for robotics trained on human videos and language to produce reusable representations and rewards. We train a single model with many downstream capabilities from features for control to expression grounding and reward/intent inference.
	Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets. Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn Robotics: Science and Systems (RSS), 2023 project page / code / Often, robot data isn't shared across projects. We present a new way that past project data can be used to improve downstream learning. We use a learned model select relevant data from a large dataset of robot interactions, which augments a small set of task demonstrations for use in a behavior cloning algorithm for more efficient learning.
	R3M: A Universal Visual Representation for Robot Manipulation Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, Abhinav Gupta Conference on Robot Learning (CoRL) 2022 ICRA Scaling Robot Learning Workshop 2022, (Best Paper Award) project page / code We pre-train a generalizable visual representation on diverse human videos and language, and show it enables far more efficient learning across a wide range of robotic manipulation tasks.
	Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual Imitation Learning Maximilian Du, Olivia Y. Lee, Suraj Nair, and Chelsea Finn Robotics: Science and Systems (RSS), 2022 project page / code / press We propose a method to enable robots to tackle challenging visually occluded manipulation tasks (like extracting keys from a bag), via end-to-end interactive imitation learning from vision and sound. .
	Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation Suraj Nair, Eric Mitchell, Kevin Chen, Brian Ichter, Silvio Savarese, Chelsea Finn Conference on Robot Learning (CoRL), 2021 project page / code We learn language-conditioned visuomotor skills on real robots from entirely offline, pre-collected datasets and crowdsourced language annotation.
	Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks Bohan Wu, Suraj Nair, Li Fei-Fei, Chelsea Finn Conference on Robot Learning (CoRL), 2021 project page EMBR is a model-based RL algorithm that learns visuomotor skills and their groundings, which can then be sequenced with symbolic planners to complete long-horizon, multi-stage manipulation tasks on real robots.
	FitVid: Overfitting in Pixel-Level Video Prediction Mohammad Babaeizadeh, Mohammad Taghi Saffar, Suraj Nair, Sergey Levine, Chelsea Finn, Dumitru Erhan Arxiv Preprint, 2021 project page / code We propose a variational video prediction model that is capable of severe overfitting on common video prediction benchmarks while having similar parameter count as the current SOTA models.
	Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos Annie S. Chen, Suraj Nair, Chelsea Finn Robotics Science and Systems (RSS), 2021 ICLR Workshop on Self-Supervised Reinforcement Learning, 2021, (Oral) project page We propose a technique for learning multi-task reward functions from a small amount of robot data and large amounts of in-the-wild human videos. By leveraging diverse human data, the learned reward function is able to generalize to new environments and tasks.
	Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction Bohan Wu, Suraj Nair, Roberto Martin-Martin, Li Fei-Fei, Chelsea Finn IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2021 project page We propose a technique for video prediction which trains a hierarchy of action-conditioned VAEs in a greedy fashion, enabling efficient training of large video prediction models.
	Model-Based Visual Planning with Self-Supervised Functional Distances Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine International Conference on Learning Representations (ICLR), 2021 (Spotlight) project page We propose a method for offline model-based RL which learns a video prediction model and a Q function based distance metric, and uses them to accomplish visually specified goals.
	Batch Exploration with Examples for Scalable Robotic Reinforcement Learning Annie S. Chen, HyunJi Nam, Suraj Nair, Chelsea Finn Robotics and Automation Letters (RA-L) and International Conference on Robotics and Automation (ICRA)*, 2021 project page / code We propose a framework for leveraging weak human superivision to enable better robotic exploration. Using just a few minutes of human supervision, the robot collects high quality data while unsupervised, providing better data for downstream offline RL.
	Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, Ken Goldberg Robotics and Automation Letters (RA-L) and International Conference on Robotics and Automation (ICRA), 2021 project page An algorithm for safe reinforcement learning which utilizes a set of offline data to learn about constraints before policy learning and a pair of policies which seperate the often conflicting objectives of task directed exploration and constraint satisfaction to learn contact rich and visuomotor control tasks.
	Goal-Aware Prediction: Learning to Model what Matters Suraj Nair, Silvio Savarese, Chelsea Finn International Conference on Machine Learning (ICML) , 2020 project page / code We explore learning visual dynamics models which are conditioned on goals, and learn to model only goal relevant quantities.
	Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation. Suraj Nair, Chelsea Finn International Conference on Learning Representations (ICLR), 2020 project page / code We study how we can learn long horizon vision-based tasks in self-supervised settings. Our approach, hierarchical visual foresight, can optimize for a sequence of subgoals that break down the task into easy to complete subsegments.
	Time Reversal as Self-Supervision Suraj Nair, Mohammad Babaeizadeh, Chelsea Finn, Sergey Levine, Vikash Kumar International Conference on Robotics and Automation (ICRA) , 2020 project page / press We propose a technique that uses time-reversal to learn goals and provide a high level plan to reach them. In particular, our approach explores outward from a set of goal states, "unsolving" a task, which then enables solving the task from new initializations at test time.
	Causal Induction from Visual Observations for Goal-Directed Tasks Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei Workshop on Causal Machine Learning NeurIPS, 2019 project page / code We explore how to effectively predict causal graphs from a small set of visual observations, and how to encorporate the learned graphs into downstream goal conditioned policy learning.
	RoboNet: Large-Scale Multi-Robot Learning Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, Chelsea Finn Conference on Robot Learning (CoRL) , 2019 project page / code / press We collect a dataset of robotic experience across 4 institutions and 7 robots, and demonstrate that robot learning algorithms leveraging this data can adapt to new environments faster than training from scratch.
	Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles IEEE Conference on Computer Vision and Pattern Recognition (CVPR)* , 2019 (Oral) NTG learns to produce a task graph from a single video demonstration of an unseen task, and leverages it for one-shot imitation learning.
	Neural Task Programming: Learning to Generalize Across Hierarchical Tasks Danfei Xu, Suraj Nair, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese International Conference on Robotics and Automation (ICRA) , 2018 project page / code / Two Minute Papers Neural Task Programming (NTP) is a meta-learning framework that learns to generate robot-executable neural programs from task demonstration video.
	Reliable RealTime Seismic Signal/Noise Discrimination With Machine Learning Men-Andrin Meier, Zachary E Ross, Anshul Ramachandran, Ashwin Balakrishna, Suraj Nair, Peter Kundzicz, Zefeng Li, Jennifer Andrews, Egill Hauksson, Yisong Yue. Journal of Geo-Physical Research: Solid Earth, 2019 Efficient prediction of real local earthquake signals from impulsive signals for earthquake early warning (EEW) alerts.
	Annotated Reconstruction of 3D Spaces Using Drones Suraj Nair, Anshul Ramachandran, Peter Kundzicz. MIT Undergraduate Research in Technology Conference (URTC), 2017 (Best Paper Presentation) Reconstruct 3D voxel representations of a scene with object labels from RGB images captured from a drone, and use it for exporatory motion planning

Teaching

Teaching Assistant: Stanford CS 330 [2019, 2020], Deep Multi-Task and Meta Learning
Teaching Assistant: Caltech CS/EE 155 [2017] , Machine Learning/Data Mining
Teaching Assistant: Caltech CS 121 [2016], Introduction to Relational Databases

Service

Reviewer: NeurIPS, ICML, ICLR, CVPR, ICCV, CoRL, ICRA, IROS

Website Template