Tianhe (Kevin) Yu
tianheyu at cs dot stanford dot edu

I am a PhD student in CS at Stanford University advised by Chelsea Finn. I am a part of Stanford Artificial Intelligence Laboratory (SAIL).

Previously, I graduated from UC Berkeley with highest honors in Computer Science, Applied Mathematics and Statistics. During my undergraduate study, I worked with Pieter Abbeel, Sergey Levine, and Alexei Efros as an undergraduate researcher in the Berkeley Artificial Intelligence Research (BAIR) Lab.

Google Scholar  /  LinkedIn  /  GitHub


My research interests lie at the intersection of machine learning, perception, and control for robotics, specifically deep reinforcement learning, imitation learning and meta-learning.

MOPO: Model-based Offline Policy Optimization
Tianhe Yu*, Garrett Thomas*, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn†, Tengyu Ma†
Neural Information Processing Systems (NeurIPS), 2020
arXiv / code

We propose a new model-based offline RL algorithm that applies the uncertainty of the dynamics as a penalty to the reward function. We find that this algorithm outperforms both standard model-based RL methods and existing state-of-the-art model-free offline RL approaches on existing offline RL benchmarks, as well as challenging continuous control tasks that require generalizing from data collected for a different task.

Gradient Surgery for Multi-Task Learning
Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn
Neural Information Processing Systems (NeurIPS), 2020
arXiv / code

We identify a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develop a simple yet general approach, projecting conflicting gradients (PCGrad), for avoiding such interference between task gradients.

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning
Tianhe Yu*, Deirdre Quillen*, Zhanpeng He*, Ryan Julian, Karol Hausman, Chelsea Finn, Sergey Levine
Conference on Robot Learning (CoRL), 2019
arXiv / website / code

We propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks, with the aim of making it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks.

Meta-Inverse Reinforcement Learning with Probabilistic Context Variables
Lantao Yu*, Tianhe Yu*, Chelsea Finn, Stefano Ermon
Neural Information Processing Systems (NeurIPS), 2019
arXiv / website

We propose a deep latent variable model that is capable of learning rewards from unstructured, multi-task demonstration data, and critically, use this experience to infer robust rewards for new, structurally-similar tasks from a single demonstration.

One-Shot Hierarchical Imitation Learning of Compound Visuomotor Tasks
Tianhe Yu, Pieter Abbeel, Sergey Levine, Chelsea Finn
International Conference on Intelligent Robots and Systems (IROS), 2019
arXiv / video

We aim to learn multi-stage vision-based tasks on a real robot from a single video of a human performing the task. We propose a method that learns both how to learn primitive behaviors from video demonstrations and how to dynamically compose these behaviors to perform multi-stage tasks by "watching" a human demonstrator.

Unsupervised Visuomotor Control through Distributional Planning Networks
Tianhe Yu, Gleb Shevchuk, Dorsa Sadigh, Chelsea Finn
Robotics: Science and Systems (RSS), 2019
arXiv / website / code

We propose an approach to learning an unsupervised embedding space under which the robot can measure progress towards a goal for itself. Our method enables learning effective and control-centric representations that lead to more autonomous reinforcement learning algorithms.

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
Tianhe Yu*, Chelsea Finn*, Annie Xie, Sudeep Dasari, Tianhao Zhang, Pieter Abbeel, Sergey Levine
Robotics: Science and Systems (RSS), 2018
arXiv / blog post / video / code

We present an approach for one-shot learning from a video of a human by using human and robot demonstration data from a variety of previous tasks to build up prior knowledge through meta-learning. Then, combining this prior knowledge and only a single video demonstration from a human, the robot can perform the task that the human demonstrated.

One-Shot Visual Imitation Learning via Meta-Learning
Chelsea Finn*, Tianhe Yu*, Tianhao Zhang, Pieter Abbeel, Sergey Levine
Conference on Robot Learning (CoRL), 2017 (Long Talk)
Oral presentation at the NIPS 2017 Deep Reinforcement Learning Symposium
arXiv / video / talk / code

We present a meta-imitation learning method that enables a robot to learn to acquire new skills from just a single visual demonstration. Our method requires data from significantly fewer prior tasks for effective learning of new skills and can also learns from a raw video as the single demonstration without access to trajectories of robot configurations such as joint angles.

Real-Time User-Guided Image Colorization with Learned Deep Priors
Richard Zhang*, Jun-Yan Zhu*, Phillip Isola, Xinyang Geng, Angela S. Lin, Tianhe Yu, Alexei A. Efros
ACM Transactions on Graphics (SIGGRAPH), 2017
arXiv / project website / video / slides / talk / code

We propose a deep learning approach for user-guided image colorization. Our system directly maps a grayscale image, along with sparse, local user "hints" to an output colorization with a deep convolutional neural network.

Generalizing Skills with Semi-Supervised Reinforcement Learning
Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine
International Conference on Learning Representations (ICLR), 2017
arXiv / video / code

We formalize the problem of semi-supervised reinforcement learning (SSRL), where the reward signal in the real world is only available in a small set of environments such as laboratories, and the robot need to leverage experiences in these instrument environments to continue learning in places where reward signal isn't available. We propose a simple algorithm for SSRL based on inverse reinforcement learning and show that it can improve performance in 'unlabeled' environments by using experience from both 'labeled' environments and 'unlabeled' environments.


CS330: Deep Multi-Task and Meta Learning - Fall 2019
Teaching Assistant