I'm an associate professor in the Stanford AI Lab (
SAIL), the center for research on foundation models (
CRFM), and the
Machine Learning Group (
bio). Our lab works on the foundations of the next generation of machine-learned systems.
- On the machine learning side, I am fascinated by how we can learn from increasingly weak forms of supervision and by the mathematical foundations of such techniques.
- On the systems side, I am broadly interested in how machine learning is changing how we build software and hardware. I'm particularly excited when we can blend ML and systems, e.g,. Snorkel, Overton (YouTube), or SambaNova.
Our work is inspired by the observation that data is central to these systems (crudely,
"AI is driven by data—not code"), and so data management principles (re-imagined) play a starring role in our work. Maybe this sounds like Silicon valley nonsense, but oddly enough, you've probably used a system that uses these ideas from our lab in the last few hours due to amazing students and collaborations with
Google ads,
YouTube,
Apple, and more.
While we're very proud of our research ideas and their impact, the lab's real goal is to help students become professors, entrepreneurs, and researchers. To that end, over a dozen members of our group have started their own professorships. With students and collaborators, I've been fortunate enough to cofound a number of companies. For the sake of transparency, I do my best to list companies I advise or invest in
here and our research sponsors
here. My students (and others!) run
ML Sys Podcast.
- Tri Dao is amazing and on the market. He's the force of nature behind the widely used Flash Attention (usage).
- Winter23: MLSys Podcast is all about Foundation Models!. Our research has been changed by foundation models (overview).
- We're interested in improving the foundations of foundation models. Blog post on sequence length and latest version
- Flash Attention is an IO-Aware algorithm. This implementation is the fastest we're aware of (including highly optimized variants from vendors) and used in ML Perf, see MLPerf Story on Tri!. Importantly, it enables the first transformer to non-trivially solve Path-X variants in long range arena.
- We continue to look at improving long sequences. An explainer of a simplified version of S4 (S4 Explainer Blog). It's a convolution and an RNN based on simple ideas from signal processing. SOTA on long range arena. First to solve Path-X. update on this line of work.
- We've been looking at how foundation models can help us build software systems, most recently:
- Some Talks and resources
A messy, incomplete log of old updates is here.