Tianhe (Kevin) Yu
tianheyu at cs dot stanford dot edu

I am a PhD student in CS at Stanford University advised by Chelsea Finn. I am a part of Stanford Artificial Intelligence Laboratory (SAIL).

Previously, I graduated from UC Berkeley with highest honors in Computer Science, Applied Mathematics and Statistics. During my undergraduate study, I worked with Pieter Abbeel, Sergey Levine, and Alexei Efros as an undergraduate researcher in the Berkeley Artificial Intelligence Research (BAIR) Lab.

Google Scholar  /  LinkedIn  /  GitHub

News
Research

My research interests lie at the intersection of machine learning, perception, and control for robotics, specifically deep reinforcement learning, imitation learning and meta-learning.


COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu*, Aviral Kumar*, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn
arXiv preprint
arXiv

Model-based offline RL methods rely on explicit uncertainty quantification for incorporating pessimism, which can be difficult and unreliable with complex models. We overcome this limitation by developing a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-action tuples generated via rollouts under the learned model. We show COMBO with theoretical guarantees and also find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods on widely studied offline RL benchmarks, including image-based tasks.


Offline Reinforcement Learning from Images with Latent Space Models
Rafael Rafailov*, Tianhe Yu*, Aravind Rajeswaran, Chelsea Finn
arXiv preprint
arXiv / website

Model-based offline RL algorithms have achieved state of the art results in state based tasks and have strong theoretical guarantees. However, they rely crucially on the ability to quantify uncertainty in the model predictions, which is particularly challenging with image observations. To overcome this challenge, we propose to learn a latent-state dynamics model, and represent the uncertainty in the latent space. We find that our algorithm significantly outperforms previous offline model-free RL methods as well as state-of-the-art online visual model-based RL methods in both simulated and real-world robotics control tasks.


Variable-Shot Adaptation for Online Meta-Learning
Tianhe Yu*, Xinyang Geng*, Chelsea Finn, Sergey Levine
arXiv preprint
arXiv

We extend previous meta-learning algorithms to handle the variable-shot settings that naturally arise in sequential learning: from many-shot learning at the start, to zero-shot learning towards the end. On sequential learning problems, we find that meta-learning solves the full task set with fewer overall labels and achieves greater cumulative performance, compared to standard supervised methods. These results suggest that meta-learning is an important ingredient for building learning systems that continuously learn and improve over a sequence of problems.


MOPO: Model-based Offline Policy Optimization
Tianhe Yu*, Garrett Thomas*, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn†, Tengyu Ma†
Neural Information Processing Systems (NeurIPS), 2020
arXiv / code

We propose a new model-based offline RL algorithm that applies the uncertainty of the dynamics as a penalty to the reward function. We find that this algorithm outperforms both standard model-based RL methods and existing state-of-the-art model-free offline RL approaches on existing offline RL benchmarks, as well as challenging continuous control tasks that require generalizing from data collected for a different task.


Gradient Surgery for Multi-Task Learning
Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn
Neural Information Processing Systems (NeurIPS), 2020
arXiv / code

We identify a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develop a simple yet general approach, projecting conflicting gradients (PCGrad), for avoiding such interference between task gradients.


Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning
Tianhe Yu*, Deirdre Quillen*, Zhanpeng He*, Ryan Julian, Karol Hausman, Chelsea Finn, Sergey Levine
Conference on Robot Learning (CoRL), 2019
arXiv / website / code

We propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks, with the aim of making it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks.


Meta-Inverse Reinforcement Learning with Probabilistic Context Variables
Lantao Yu*, Tianhe Yu*, Chelsea Finn, Stefano Ermon
Neural Information Processing Systems (NeurIPS), 2019
arXiv / website

We propose a deep latent variable model that is capable of learning rewards from unstructured, multi-task demonstration data, and critically, use this experience to infer robust rewards for new, structurally-similar tasks from a single demonstration.


One-Shot Hierarchical Imitation Learning of Compound Visuomotor Tasks
Tianhe Yu, Pieter Abbeel, Sergey Levine, Chelsea Finn
International Conference on Intelligent Robots and Systems (IROS), 2019
arXiv / video

We aim to learn multi-stage vision-based tasks on a real robot from a single video of a human performing the task. We propose a method that learns both how to learn primitive behaviors from video demonstrations and how to dynamically compose these behaviors to perform multi-stage tasks by "watching" a human demonstrator.


Unsupervised Visuomotor Control through Distributional Planning Networks
Tianhe Yu, Gleb Shevchuk, Dorsa Sadigh, Chelsea Finn
Robotics: Science and Systems (RSS), 2019
arXiv / website / code

We propose an approach to learning an unsupervised embedding space under which the robot can measure progress towards a goal for itself. Our method enables learning effective and control-centric representations that lead to more autonomous reinforcement learning algorithms.


One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
Tianhe Yu*, Chelsea Finn*, Annie Xie, Sudeep Dasari, Tianhao Zhang, Pieter Abbeel, Sergey Levine
Robotics: Science and Systems (RSS), 2018
arXiv / blog post / video / code

We present an approach for one-shot learning from a video of a human by using human and robot demonstration data from a variety of previous tasks to build up prior knowledge through meta-learning. Then, combining this prior knowledge and only a single video demonstration from a human, the robot can perform the task that the human demonstrated.


One-Shot Visual Imitation Learning via Meta-Learning
Chelsea Finn*, Tianhe Yu*, Tianhao Zhang, Pieter Abbeel, Sergey Levine
Conference on Robot Learning (CoRL), 2017 (Long Talk)
Oral presentation at the NIPS 2017 Deep Reinforcement Learning Symposium
arXiv / video / talk / code

We present a meta-imitation learning method that enables a robot to learn to acquire new skills from just a single visual demonstration. Our method requires data from significantly fewer prior tasks for effective learning of new skills and can also learns from a raw video as the single demonstration without access to trajectories of robot configurations such as joint angles.


Real-Time User-Guided Image Colorization with Learned Deep Priors
Richard Zhang*, Jun-Yan Zhu*, Phillip Isola, Xinyang Geng, Angela S. Lin, Tianhe Yu, Alexei A. Efros
ACM Transactions on Graphics (SIGGRAPH), 2017
arXiv / project website / video / slides / talk / code

We propose a deep learning approach for user-guided image colorization. Our system directly maps a grayscale image, along with sparse, local user "hints" to an output colorization with a deep convolutional neural network.


Generalizing Skills with Semi-Supervised Reinforcement Learning
Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine
International Conference on Learning Representations (ICLR), 2017
arXiv / video / code

We formalize the problem of semi-supervised reinforcement learning (SSRL), where the reward signal in the real world is only available in a small set of environments such as laboratories, and the robot need to leverage experiences in these instrument environments to continue learning in places where reward signal isn't available. We propose a simple algorithm for SSRL based on inverse reinforcement learning and show that it can improve performance in 'unlabeled' environments by using experience from both 'labeled' environments and 'unlabeled' environments.

Teaching

CS330: Deep Multi-Task and Meta Learning - Fall 2019
Teaching Assistant


Template