I'm an associate professor in the Stanford AI Lab (SAIL) affiliated with DAWN and the Statistical Machine Learning Group (bio). Our lab works on the foundations of the next generation of machine-learned systems. While we're very proud of our research ideas and their impact, the lab's real goal is to help amazing students become professors, entrepreneurs, and researchers. With my students and collaborators, I've been fortunate enough to cofound projects including Lattice and Inductiv (HoloClean), which are now both part of Apple, along with SambaNova and Snorkel. The honor that still doesn't feel real is the MacArthur Fellowship.
On the machine learning side, I am fascinated by how we can learn from increasingly weak forms of supervision and the mathematical foundations of such techniques. On the systems side, I am broadly interested in how machine learning is changing how we build software and hardware. I'm particularly excited when we can blend ML and Systems, e.g,. Snorkel. My MLSys 20 keynote talk (pdf|pptx) and talk for WWW BIG) has an overview of our recent work. As for future directions, the lab wrote up some content about their take on our past and future directions hosted on new group website
- Teaching ML CS229 this spring
- Students and Postdocs described their view on Software 2.0 and what's next
- Snorkel is in a new location Snorkel.org. Crazily enough, you've probably used a system that has a Snorkel-powered or Snorkel-inspired component in the last few hours (thanks to collaborations with Google ads, the folks at Gmail, Apple, and many more). Excited for all the great collaborations!
- In ICML 2020, we describe our continuing work on weak supervision and data augmentation in two papers:
- In ACL2020, we describe some of our continuing work on embeddings, compression, and geometry.
- Ines et al. explore when you can use hyperbolic geometry for low-dimensional knowledge graph embeddings.
- Simran and Avner describe some tradeoffs in a short paper Contextual Embeddings: When are they worth it?
- In ICLR2020
- Hongyang and Sen describe theory that helps tell us when multitask learning works--and when it doesn't!
- Tri et al. describe Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps, and they show they can learn hand-tuned features in speech pipelines--from scratch! Spotlight
- Charles leads the way on understaning the link between weak supervision and instrumental variables for causal inference in AISTATS20
- Some work on sparse recovery for Jacobi Polynomials in ICALP20.
- In CIDR20, paper about our Overton work at Apple including zero-code deep learning, weak supervision, and data slicing.
- A bunch of great collaborations in nature-family journals, clinical journals, and others
- In Science Translational Medicine, Johannes, Gill, et al describe AMELIE that how to speed up diagnosis for rare diseases.
- In BMC Bioinformatics, Emily, Russ et al describe how to Extract Chemical Reactions from Text using Snorkel
- In Cell Patterns, Jared and Alex examine how to weakly supervise text and images.
- In NPJ Digital Medicine, Khaled leads applying weak supervision to EEG for efficient seizure detection.
- In NPJ Digital Medicine, Alison Callahan and Jason A Fries led an amazing effort to apply weak supervision in device surveillance in health records or here.
- In Nature Comms, Weak supervision for Cardiac MRI videos for rare aoritc valve disorders
- In Nature Comms, the world's largest machine read GWASKB--both with help from Snorkel's ideas.
- In Radiology, Jared's paper about using deep learning in image triage: at what training set sizes do modern methods provide utility in radiology? This is collaboration with great folks in the medical school!
A messy, incomplete log of old updates is here.