
I'm an assistant professor in the InfoLab and affiliated with the PPL and SAIL labs, and I work on the fundamentals of the next generation of data management systems (bio here). This means we work on databases, theory, and machine learning, and we worry about hardware trends. A major application of our work is to make it dramatically easier to build highquality systems to process more of the world's dark data (sql databases, text, and images). Recently, we've shown that our systems can even exceed human volunteer quality in reading scientific journal articles (featured in Nature).
 New Tradeoffs for Systems. The next generation of data systems need to make fundamentally new tradeoffs. For example, we proved that many statistical algorithms can be run in parallel without locks (Hogwild! or SCD) or with lower precision. This leads to a fascinating systems tradeoff between statistical and hardware efficiency. These ideas have been picked up by web and enterprise companies for everything from recommendation to deep learning.
 New Programming Models. The DeepDive system demonstrates that one can build high quality applications that use machine learning without specifying an inference algorithm, which makes it usable by a wider range of people. Our goal for the last few years has been to dramatically reduce the time analyst spend specifying models, maintaining them, and collaboratively building models.
 New Database Engines. We're thinking about how these new workloads change how one would build a database. We're building a new database, EmptyHeaded, that extends our theoretical work on optimal join processing. Multiway join algorithms are asymptotically and empirically faster than traditional database engines—by orders of magnitude. We're using it to unify database querying, graph patterns, linear algebra and inference, RDF processing, and more soon.
DeepDive is tons of fun (one pager) and commericialized as Lattice. Our code is on github. Data is here. Twitter @HazyResearch sometimes.

News
 Some new manuscripts are posted.
 In a short note Ioannis, Ce, and Stefan show that asynchrony for SGD can be viewed as adding a momentum term. The analysis does not depend on whether the function is convex, which means it applies to Deep learning. This makes me feel a little better about people running Hogwild! Deep Learning systems. But it does mean you need to tune your momentum. It is a key ingredient in Omnivore, our deep learning optimizer, which is described here.
 Data Programming: Creating Large Training Sets, Quickly by Alex Ratner, Chris De Sa, Sen Wu, and Daniel Selsam is now in use inside DDlite. User studies reported here. Very exciting!
 Chris De Sa and Bryan He continue to discover fundamental properties of Gibbs Sampling including scan order and asynchrony (Best Paper at ICML16), used in our generative models in DDLite and factor graph inference.
 SIGMOD and PODS
 SIGMOD16 EmptyHeaded: A Relational Engine for Graph Processing by Chris Aberger discusses how to use SIMD hardware to support our worstcase optimal join algorithms to find graph patterns. It's fast! on github.
 SIGMOD16 Industrial. A paper describing DeepDive and its use in Lattice written largely by Mike Cafarella (CEO).
 PODS16. Rohan and Manas extend our new join algorithms to message passing and so fast matrix multiplication. The point is: Standard worstcase algorithms are enough to get the best asympotic runtimes for these problems. A step toward the vision of unifying relational and linear algebra systems using GHDs
 HILDA. We describe our group's work on data programming and DDlite. We're really excited about the ability to quickly create high quality extractors!
 Elated that our group's work was honored by a MacArthur Foundation Fellowship. So excited for what's next!
 Our course material from CS145 intro databases is here, and we'll continue to update it. We're aware of a handful of courses that are using these materials. Drop us a note, if you do! We hope to update them throughout the year.
 Some new manuscripts are posted.
 A messy, incomplete log of old updates is here.