
I'm an assistant professor in the InfoLab and affiliated with the PPL and SAIL labs. Bio here. My group's goal is to make it dramatically easier to build highquality systems to process more of the world's dark data (sql databases, text, and images). Data analytics is amazing: a faster system often produces higher quality output. As a result, the area is noholdsbarred nerd fun: we work on databases, theory, and machine learning, and we worry about hardware trends. Basically, we work on anything that can improve performance. Recently, our work has translated into high quality: we've shown that such systems can even exceed human volunteer quality in reading scientific journal articles featured in Nature.
 We argue that analytics systems need to make fundamentally new tradeoffs. For example, we showed that popular algorithms can be run in parallel without locks (Hogwild! or SCD) or with lower precision. There is a fascinating tradeoff between statistical and hardware efficiency. Ideas picked up by deep learning web companies and enterprise tools.
 We worry about making analytics easier to program... a lot. Our goal for the last few years has been to dramatically reduce the time analyst spend specifying models and reducing barriers to entry. The DeepDive system is intended to demonstrate that one can build high quality machine learning applications without ever specifying an inference algorithm, which we believe makes it usable by a wider range of people.
 We're thinking about how these new workloads change classical database workloads. We're building a new database EmptyHeaded that extends our theoretical work on optimal join processing. These multiway join algorithms are theoretically (asymptotically) and empirically faster than traditional database engines. We're using it to unify database querying, graph patterns, linear algebra and inference, RDF processing, and more.
DeepDive is tons of fun (one pager). Our code is on github. Data is here. Twitter @HazyResearch sometimes.

News
 Elated that our group's work was honored by a MacArthur Foundation Fellowship. So excited for what's next!
 Our course material from CS145 intro databases is here, and we'll continue to update it. We're aware of a handful of courses that are using these materials, drop us a note if you do! We hope to update them throughout the year.
 ICDT16. It’s all a matter of degree: Using degree information to optimize multiway joins by Manas Joglekar discusses one technique to use degree information to perform joins faster (asymptotically!).
 SODA16. Weighted SGD for lp Regression with Randomized Preconditioning by Jiyan, Yin Lam Chow, Michael Mahoney, and me looks at some preconditioning methods to speed up SGD in theory and practice.
 NIPS15. Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width by Chris De Sa explains a notion of width that allows one to bound mixing times for factor graphs.
 NIPS15. Taming the Wild: A Unified Analysis of Hogwild!Style Algorithms by Chris De Sa et al. derives results for low precision and nonconvex Hogwild! (asynchronous) style algorithms.
 VLDB15. Incremental Knowledge Base Construction Using DeepDive is our latest description of DeepDive.
 VLDB15. Honored to receive the VLDB Early Career Award for scalable analytics. talk video.
 New. EmptyHeaded: A Relational Algebra for Graph Processing by Chris Aberger discusses how to use SIMD hardware to support our worstcase optimal join algorithms to find graph patterns. It's fast! on github.
 New. Increasing the parallelism in multiround MapReduce join plans. Semih Salihoglu, Manas Joglekar, and crew show that you can recover classical results about parallelizing acyclic queries using only Yannakakis's algorithm and our recent algorithms for generalized fractional hypertree decompositions for joins.
 New. Rohan and Manas extend our new join algorithms to message passing and so fast matrix multiplication. A step toward the vision of unifying relational and linear algebra systems using GHDs

Upcoming Meetings and Talks
 USC ML. Jan 26.
 Berkeley. Feb 3.
 Michigan. Mar 11.
 ICDT. Mar 1521.
 Strata. Mar 2831.
 SIMPLEX. April 47.
 Dagstuhl. Foundations of Databases. April 1015.
 Wisconsin 50th. April 21.
 System X. May 1012.
 ICDE. Dark data! May 1620.
 Inside the Black Box. June 8.
 MMDS. June 21June 24.
 SIGMOD. June 26Jul 1.
 Randomized Linear Algebra. JAPAN. Jul. 2529.
 A messy, incomplete log of old updates is here.