I'm an associate professor affiliated with DAWN
, Statistical Machine Learning Group
, and SAIL
). I work on the foundations of the next generation of data analytics systems. These systems extend ideas from databases, machine learning, and theory, and our group is active in all areas. An application of our work is to make it dramatically easier to build machine learning systems to process dark data
including text, images, and video. Our latest project is Snorkel
, our code is
and there are blog posts
about our work. By pushing the limits of weak supervision
and data augmentation, we hope to make it radically easier to build machine learning systems and deepen our understanding of machine learning's underpinnings.
To validate our ideas, we continue to build systems that we hope change the way people do science and improve society. This work is with great partners in areas including paleobiology
), drug repurposing
, genomics, material science, and the fight against human trafficking (60 minutes
, Scientific American
, and Wired
). Our work is supporting investigations
. In the past, we've worked with a neutrino telescope (IceCube Science cover
and our modest contribution
) and on economic indicators
- Alex and Braden's thoughts on the role of massive multitask learning and weak supervision in Software 2.0 in CIDR19.
- Paroma's thoughts about automating weak supervision in Reef in VLDB19
- A draft writeup about Bit Centering a technique to use accelerators for low-precision training. ISMP18.
- A small writeup about Software 2.0 and Snorkel for KDD18.
- In ICML18, Hyperbolic embddings embed structured knowledge in continuous space, a blog post about our work.
- In ACL18, train your classfiers with natural language! Braden, Percy, and Stephanie show you how in ACL18.
- In SODA18, we characterize the largest class of matrices for which matrix-vector multiply is in (nearly) linear time. Anna is exploring whether these matrices can be useful in deep learning ICLR18. Thank you to the NSF for supporting this work!
- Snorkel system paper in VLDB18 with its own blog. Software 2.0 madness is coming...
- In SIGMOD18, Founduer constructs knowledge bases from richly formatted data using visual and textual reasoning.
- For PCA and other mildly nonconvex matrix problems, in AIStas18, we show that a simple stochastic algorithm gets the optimal accelerated rate--and that the standard Polyak momentum scheme can't give acceleration in the stochastic case.
- Theo, Ihab, and crew have released Holoclean.
) project was commercialized as Lattice. As of 2017, Lattice is part of Apple. Our work on architectural changes for converged analytics and machine learning is commericialized as SambaNova Systems
A messy, incomplete log of old updates is here