Tianhe (Kevin) Yu
tianheyu at google dot com, tianheyu at cs dot stanford dot edu

I am a Research Scientist at Google Brain. I received my Ph.D. in Computer Science from Stanford University advised by Chelsea Finn. I obtained my bachelor's degree from UC Berkeley with highest honors in Computer Science, Applied Mathematics and Statistics.

Google Scholar  /  LinkedIn  /  Twitter  /  GitHub

News
Research

My research interests lie at the intersection of machine learning, perception, and control, specifically offline reinforcement learning (i.e. learning from a static dataset), multi-task and meta-learning. Recently, I'm exploring leveraging foundation models in decision-making problems.


Scaling Robot Learning with Semantically Imagined Experience
Tianhe Yu, Ted Xiao, Austin Stone, Jonathan Tompson, Anthony Brohan, Su Wang, Jaspiar Singh, Clayton Tan, Dee M, Jodilyn Peralta, Brian Ichter, Karol Hausman, Fei Xia
arXiv preprint
arXiv / website / video

We propose to scale up real-world robot learning without the burden of real-world data collection. We make use of the state of the art text-to-image diffusion models and perform aggressive data augmentation on top of our existing robotic manipulation datasets via inpainting various unseen objects for manipulation, backgrounds, and distractors with text guidance. Through extensive real-world experiments, we show that manipulation policies trained on data augmented this way are able to solve completely unseen tasks with new objects and can behave more robustly w.r.t. novel distractors.


PaLM-E: An Embodied Multimodal Language Model
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence
arXiv preprint
arXiv / website / demo

We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. Our evaluations show that PaLM-E, a single large embodied multimodal model, can address a variety of embodied reasoning tasks, from a variety of observation modalities, on multiple embodiments, and further, exhibits positive transfer: the model benefits from diverse joint training across internet-scale language, vision, and visual-language domains.


RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha Manjunath, Igor Mordatch, Ofir Nachum, Carolina Parada, Jodilyn Peralta, Emily Perez, Karl Pertsch, Jornell Quiambao, Kanishka Rao, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Kevin Sayed, Jaspiar Singh, Sumedh Sontakke, Austin Stone, Clayton Tan, Huong Tran, Vincent Vanhoucke, Steve Vega, Quan Vuong, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich (alphabetical ordering)
arXiv preprint
arXiv / website / video / blogpost / code

We present a model class, dubbed Robotics Transformer, that exhibits promising scalable, pre-trained model properties. We verify our conclusions in a comprehensive study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.


How to Leverage Unlabeled Data in Offline Reinforcement Learning
Tianhe Yu*, Aviral Kumar*, Yevgen Chebotar, Karol Hausman, Chelsea Finn, Sergey Levine
International Conference on Machine Learning (ICML), 2022
arXiv

Offline RL requires reward annotations for every transition, which may be costly, while collecting diverse unlabeled data might be comparatively inexpensive. How can we best leverage such unlabeled data in offline RL? We find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing, without learning any reward model at all. We provide extensive theoretical and empirical analysis that illustrates how it trades off reward bias, sample complexity and distributional shift, often leading to good results.


Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
Tianhe Yu*, Aviral Kumar*, Yevgen Chebotar, Karol Hausman, Sergey Levine, Chelsea Finn
Neural Information Processing Systems (NeurIPS), 2021
arXiv

We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks. However, sharing data across all tasks in multi-task offline RL performs surprisingly poorly in practice due to exacerbation of the distributional shift. To this end, we develop a simple technique for data-sharing in multi-task offline RL that routes data based on the improvement over the task-specific data.


COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu*, Aviral Kumar*, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn
Neural Information Processing Systems (NeurIPS), 2021
arXiv

Model-based offline RL methods rely on explicit uncertainty quantification for incorporating pessimism, which can be difficult and unreliable with complex models. We overcome this limitation by developing a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-action tuples generated via rollouts under the learned model. We show COMBO with theoretical guarantees and also find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods on widely studied offline RL benchmarks, including image-based tasks.


Visual Adversarial Imitation Learning using Variational Models
Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn
Neural Information Processing Systems (NeurIPS), 2021
arXiv

We develop a variational model-based adversarial imitation learning (V-MAIL) algorithm. The model-based approach provides a strong signal for representation learning, enables sample efficiency, and improves the stability of adversarial training.


Efficiently Identifying Task Groupings for Multi-Task Learning
Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn
Neural Information Processing Systems (NeurIPS), 2021 (Spotlight)
arXiv / code

We find that in multi-task learning, naïvely training all tasks together in one model often degrades performance, and exhaustively searching through combinations of task groupings can be prohibitively expensive. In this paper, we suggest an approach to select which tasks should train together in multi-task learning models. Our method determines task groupings in a single training run by co-training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss.


Offline Reinforcement Learning from Images with Latent Space Models
Rafael Rafailov*, Tianhe Yu*, Aravind Rajeswaran, Chelsea Finn
Learning for Decision Making and Control (L4DC), 2021 (Oral presentation)
arXiv / website

Model-based offline RL algorithms have achieved state of the art results in state based tasks and have strong theoretical guarantees. However, they rely crucially on the ability to quantify uncertainty in the model predictions, which is particularly challenging with image observations. To overcome this challenge, we propose to learn a latent-state dynamics model, and represent the uncertainty in the latent space. We find that our algorithm significantly outperforms previous offline model-free RL methods as well as state-of-the-art online visual model-based RL methods in both simulated and real-world robotics control tasks.


Variable-Shot Adaptation for Online Meta-Learning
Tianhe Yu*, Xinyang Geng*, Chelsea Finn, Sergey Levine
preprint
arXiv

We extend previous meta-learning algorithms to handle the variable-shot settings that naturally arise in sequential learning: from many-shot learning at the start, to zero-shot learning towards the end. On sequential learning problems, we find that meta-learning solves the full task set with fewer overall labels and achieves greater cumulative performance, compared to standard supervised methods. These results suggest that meta-learning is an important ingredient for building learning systems that continuously learn and improve over a sequence of problems.


MOPO: Model-based Offline Policy Optimization
Tianhe Yu*, Garrett Thomas*, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn†, Tengyu Ma†
Neural Information Processing Systems (NeurIPS), 2020
arXiv / code

We propose a new model-based offline RL algorithm that applies the uncertainty of the dynamics as a penalty to the reward function. We find that this algorithm outperforms both standard model-based RL methods and existing state-of-the-art model-free offline RL approaches on existing offline RL benchmarks, as well as challenging continuous control tasks that require generalizing from data collected for a different task.


Gradient Surgery for Multi-Task Learning
Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn
Neural Information Processing Systems (NeurIPS), 2020
arXiv / code

We identify a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develop a simple yet general approach, projecting conflicting gradients (PCGrad), for avoiding such interference between task gradients.


Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning
Tianhe Yu*, Deirdre Quillen*, Zhanpeng He*, Ryan Julian, Karol Hausman, Chelsea Finn, Sergey Levine
Conference on Robot Learning (CoRL), 2019
arXiv / website / code

We propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks, with the aim of making it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks.


Meta-Inverse Reinforcement Learning with Probabilistic Context Variables
Lantao Yu*, Tianhe Yu*, Chelsea Finn, Stefano Ermon
Neural Information Processing Systems (NeurIPS), 2019
arXiv / website

We propose a deep latent variable model that is capable of learning rewards from unstructured, multi-task demonstration data, and critically, use this experience to infer robust rewards for new, structurally-similar tasks from a single demonstration.


One-Shot Hierarchical Imitation Learning of Compound Visuomotor Tasks
Tianhe Yu, Pieter Abbeel, Sergey Levine, Chelsea Finn
International Conference on Intelligent Robots and Systems (IROS), 2019
arXiv / video

We aim to learn multi-stage vision-based tasks on a real robot from a single video of a human performing the task. We propose a method that learns both how to learn primitive behaviors from video demonstrations and how to dynamically compose these behaviors to perform multi-stage tasks by "watching" a human demonstrator.


Unsupervised Visuomotor Control through Distributional Planning Networks
Tianhe Yu, Gleb Shevchuk, Dorsa Sadigh, Chelsea Finn
Robotics: Science and Systems (RSS), 2019
arXiv / website / code

We propose an approach to learning an unsupervised embedding space under which the robot can measure progress towards a goal for itself. Our method enables learning effective and control-centric representations that lead to more autonomous reinforcement learning algorithms.


One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
Tianhe Yu*, Chelsea Finn*, Annie Xie, Sudeep Dasari, Tianhao Zhang, Pieter Abbeel, Sergey Levine
Robotics: Science and Systems (RSS), 2018
arXiv / blog post / video / code

We present an approach for one-shot learning from a video of a human by using human and robot demonstration data from a variety of previous tasks to build up prior knowledge through meta-learning. Then, combining this prior knowledge and only a single video demonstration from a human, the robot can perform the task that the human demonstrated.


One-Shot Visual Imitation Learning via Meta-Learning
Chelsea Finn*, Tianhe Yu*, Tianhao Zhang, Pieter Abbeel, Sergey Levine
Conference on Robot Learning (CoRL), 2017 (Long Talk)
Oral presentation at the NIPS 2017 Deep Reinforcement Learning Symposium
arXiv / video / talk / code

We present a meta-imitation learning method that enables a robot to learn to acquire new skills from just a single visual demonstration. Our method requires data from significantly fewer prior tasks for effective learning of new skills and can also learns from a raw video as the single demonstration without access to trajectories of robot configurations such as joint angles.


Real-Time User-Guided Image Colorization with Learned Deep Priors
Richard Zhang*, Jun-Yan Zhu*, Phillip Isola, Xinyang Geng, Angela S. Lin, Tianhe Yu, Alexei A. Efros
ACM Transactions on Graphics (SIGGRAPH), 2017
arXiv / project website / video / slides / talk / code

We propose a deep learning approach for user-guided image colorization. Our system directly maps a grayscale image, along with sparse, local user "hints" to an output colorization with a deep convolutional neural network.


Generalizing Skills with Semi-Supervised Reinforcement Learning
Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine
International Conference on Learning Representations (ICLR), 2017
arXiv / video / code

We formalize the problem of semi-supervised reinforcement learning (SSRL), where the reward signal in the real world is only available in a small set of environments such as laboratories, and the robot need to leverage experiences in these instrument environments to continue learning in places where reward signal isn't available. We propose a simple algorithm for SSRL based on inverse reinforcement learning and show that it can improve performance in 'unlabeled' environments by using experience from both 'labeled' environments and 'unlabeled' environments.

Teaching

CS330: Deep Multi-Task and Meta Learning - Fall 2019
Teaching Assistant


Template