I'm an associate professor affiliated with DAWN
, Statistical Machine Learning Group
, and SAIL
). I work on the foundations of the next generation of data analytics systems. These systems extend ideas from databases, machine learning, and theory, and our group is active in all areas. An application of our work is to make it dramatically easier to build machine learning systems to process dark data
including text, images, and video. Our latest project is Snorkel
, our code is
and there are blog posts
about our work. By pushing the limits of weak supervision
and data augmentation, we hope to make it radically easier to build machine learning systems and deepen our understanding of machine learning's underpinnings.
To validate our ideas, we continue to build systems that we hope change the way people do science and improve society. This work is with great partners in areas including paleobiology
), drug repurposing
, genomics, material science, and the fight against human trafficking (60 minutes
, Scientific American
, and Wired
). Our work is supporting investigations
. In the past, we've worked with a neutrino telescope (IceCube Science cover
and our modest contribution
) and on economic indicators
- Blog post about HALP (high-accuracy, low-precision) training. Sometimes, you can train in low precision and still get high accuracy using HALP. The main idea is to dynamically adjust your numeric representation using SVRG as guidance.
- Hyperbolic embddings embed structured knowledge in continuous space, a blog post explaining some of our initial work.
- In SODA18, we characterize the largest class of matrices for which matrix-vector multiply is in (nearly) linear time. Anna is exploring whether these matrices can be useful in deep learning ICLR18. Thank you to the NSF for supporting this work!
- Snorkel system paper in VLDB18 with its own blog. Software 2.0 madness is coming...
- In SIGMOD18, Founduer constructs knowledge bases from richly formatted data using visual and textual reasoning.
- For PCA and other mildly nonconvex matrix problems, in AIStas18, we show that a simple stochastic algorithm gets the optimal accelerated rate--and that the standard Polyak momentum scheme can't give acceleration in the stochastic case.
- Theo, Ihab, and crew are releasing Holoclean soon!
) project was commercialized as Lattice. As of 2017, Lattice is part of Apple. Our work on architectural changes for converged analytics and machine learning is commericialized as SambaNova Systems
A messy, incomplete log of old updates is here