In this post, I introduce DoReMi, a novel algorithm that automatically weights how much of each data domain to use and results in 2.6x faster training on The Pile, along with an open-source PyTorch implementation.
Joint with Sewon Min. In this post, we provide a Bayesian inference framework for in-context learning in large language models like GPT-3 and show empirical evidence for our framework, highlighting the differences from traditional supervised learning.