Title: Large Language Models: Will they keep getting bigger? And, how will we use them if they do?
Speaker: Luke Zettlemoyer, University of Washington
Date: October 21
Event link: https://stanford.zoom.us/j/99311855970?pwd=TVRUSHpESVExdU1Kdy9acjdWeVBiZz09 Password: 349895.
Abstract: The trend of building ever larger language models has dominated much research in NLP over the last few years. In this talk, I will discuss our recent efforts to (at least partially) answer two key questions in this area: Will we be able to keep scaling? And, how will we actually use the models, if we do? I will cover our recent efforts on learning new types of sparse mixtures of experts (MoEs) models. Unlike model-parallel algorithms for learning dense models, which are very difficult to further scale with existing hardware, our sparse approaches have significantly reduced cross-node communication costs and could possibly provide the next big leap in performance, although finding a version that scales well in practice remains an open challenge. I will also present our recent work on prompting language models that better controls for surface form variation, to improve performance of models that are so big we can only afford to do inference, with little to no task-specific fine tuning. Finally, time permitting, I will discuss work on new forms of supervision for language model training, including learning from the hypertext structure of web pages to provide new signals for both learning and prompting the model. Together, these methods present our best guesses for how to keep the scaling trend alive as we move forward to the next generation of NLP models. This talk describes work done at the University of Washington and Facebook, primarily led by Armen Aghajanyan, Ari Holtzmann, Mike Lewis, Sewon Min, and Peter West.
Bio: Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Scientist at Facebook. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. Honors include multiple paper awards, a PECASE award, and an Allen Distinguished Investigator Award. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.