I'm an associate professor affiliated with DAWN
, the Statistical Machine Learning Group
, and SAIL
). Our lab works on the foundations of the next generation of machine-learning systems. While we're very proud of our research ideas and their impact, the lab's real goal is to help amazing students become professors, entrepreneurs, and researchers. With my students and collaborators, I've been fortunate enough to found companies including Lattice, now part of Apple, and SambaNova
. The honor that still doesn't feel real is the MacArthur Fellowship.
- Alison Callahan and Jason A Fries led an amazing effort to apply weak supervision in device surveillance in health records.
- In Nature Comms, two papers using Weak supervision for Cardiac MRI videos for rare aoritc valve disorders and the world's largest machine read GWASKB--both with help from Snorkel's ideas.
- The Snorkel Blog contains info on that project. SotA in benchmarks like GLUE and SuperGlue--and industrial use!
- In ICML19, we talk about learning structure with only weak supervision, a theory for data augmentation, and how to learn structured matrices that are provably fast using butterfly factorizations.
- In SIGMOD19, with folks at Google, we talk about lessons learned from Snorkel applied at Google in DryBell.
- In AAAI19, Snorkel folks talk about Training Complex Models with Multi-Task Weak Supervision. We've seen this is a new and exciting way to build machine learning software.
- In Radiology Jan 19, Jared's paper about using deep learning in image triage: at what training set sizes do modern methods provide utility in radiology? This is collaboration with great folks in the medical school!
- In AIStats19, Tri, Avner, and Jian try explain how low-precision random Fourier features generalize better than Nystrom features in the same amount of memory--this result surprising is since, if you measure by feature count the reverse is true!
- Checkout great workshops run in part by our students Relational Representaton Learning at NeurIPS, and learning with 2nd Learning with Limited data and Graph Representation Learning at ICLR19.
- Beliz, Albert, and Fred learn embeddings in mixed product spaces (hyperbolic, spherical, and euclidean) in ICLR19.
- Alex and Braden's thoughts on the role of massive multitask learning and weak supervision in Software 2.0 in CIDR19.
- Paroma's thoughts about automating weak supervision in Reef in VLDB19
To validate our ideas, we continue to build systems that we hope change the way people do science and improve society. This work is with great partners in areas including paleobiology
), drug repurposing
, genomics, material science, and the fight against human trafficking (60 minutes
, Scientific American
, and Wired
). Our work is supporting investigations
. In the past, we've worked with a neutrino telescope (IceCube Science cover
and our modest contribution
) and on economic indicators
Some of the industrial engagements that we're most proud of are: Software 2.0 products with Apple via Lattice, with Google Ads (blog
), and with Intel via Snorkel
. We're proud of all the folks who adopted Snorkel
! Technical ideas including Hogwild! in Microsoft's deep learning system (Wired
), momentum correction for delay in Nvidia
, and high-accuracy low-precision (HALP) in ImageNet in minutes from Tencent
. Our work also led to some classical analytics layers for companies like Oracle, Cloudera
, and Pivotal
. In benchmarking, GLUE
, and better-than-volunteer accuracy in machine reading for paleobiology
) project was commercialized as Lattice. As of 2017, Lattice is part of Apple. Our work on architectural changes for converged analytics and machine learning is commericialized as SambaNova Systems
A messy, incomplete log of old updates is here