Sang Michael Xie

Sep 14, 2023

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

In this post, I introduce DoReMi, a novel algorithm that automatically weights how much of each data domain to use and results in 2.6x faster training on The Pile, along with an open-source PyTorch implementation.

Aug 01, 2022

How does in-context learning work? A framework for understanding the differences from traditional supervised learning

Joint with Sewon Min. In this post, we provide a Bayesian inference framework for in-context learning in large language models like GPT-3 and show empirical evidence for our framework, highlighting the differences from traditional supervised learning.

Blog Posts

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

How does in-context learning work? A framework for understanding the differences from traditional supervised learning