
DeepDive has been released, and more updates are coming soon! DeepDive is a generic probabilistic inference engine that uses a declarative language (SQL) to define factor graphs. DeepDive's most popular use case is knowledge base construction (KBC). Recently, some DeepDivebased KBC applications have exceeded the quality of humans in both precision and recall. DeepDive was recently mentioned in Forbes and as a top tool for data science.

New Results, Funding News, and Press
 PaleoDB. Our assessment of PaleoDeepDive is here!. On some extraction tasks, PaleoDeepDive exceeds human volunteer quality in both precision and recall. Thank you to PaleoDB and Shanan Peters for all their painstaking work!
 VLDB14. We've just released a description of our sampling and inference engine DimmWitted: A Study of MainMemory Statistical Analytics See you in China!
 Thank you to Microsoft for giving a nod to Hogwild! by saying Adam system is based on our Hogwild! system. We love that they got the exclamation point into print, although it's dubious that anyone is more Hogwild! than we are!
 SIGMOD/PODS. Our papers are about joins and feature selection.
 SIGMOD14 Materialization Optimizations for Feature Selection shows that using a DSL and some novel optimizations, we can get order of magnitude performance wins for feature engineering in R. Thank you to SIGMOD for selecting this as the best paper!
 PODS The Minesweeper paper is our attempt to go beyond worstcase analysis for join algorithms. We (with Dung Nguyen) develop a new algorithm that we call Minesweeper based on these ideas. The main theoretical idea is to formalize the amount of work any algorithm spends certifying (using a set of propositional statements) that the output set is complete (and not, say, a proper subset). We call this set of propositions the certificate. We manage to establish a dichotomy theorem for this stronger notion of complexity: if a query is what Ron Fagin calls betaacyclic, then Minesweeper runs in time linear in the certificate; if a query is betacyclic than on some instance any algorithm takes time that is super linear in the certificate. The results get sharper and more fun. Also, Dung is a superhero and has implemented a variant of this algorithm.
 ICML14. Led by Ji Liu and Stephen J. Wright, Victor Bittorf, Srikrishna Sridhar and I have some new theory about An Asynchronous Parallel Stochastic Coordinate Descent Algorithm in ICML14.
 Joins! New papers about one of my favorite topics, Joins. This is joint work with Hung Q Ngo and Atri Rudra.
 We have written a short survey for SIGMOD record about recent advances in worstcase optimal join algorithms. Our goal is to give a highlevel view of the worstcase optimality results for practitioners and applied researchers. We also managed to simplify the arguments.
 The Minesweeper paper is the first beyondworstcase analysis for any join algorithm (PODS14).
 Our Tetris paper describes (what we think is) a beautiful framework for beyond worst case and worstcase optimal algorithms for joins via a new connection between geometry and DPLL resolution.
 A full version of our join algorithm with worstcase optimal running time is here (original PODS 2012 paper).
 Thank you to Master's in Data Science for naming me a thought leader. It's humbling to be listed with such great people.
 Our paper with Michigan Econ (Shaprio and Levinson) and Michigan CS people (Antenucci and Cafarella) about using Twitter to predict economic indicators is out. It has been picked up by The Economist's blog, Wall Street Journal, the Boston Globe, The Washington Post, Patria (Czech), and Slate.
 Google We want to thank Google for funding our research project, Trust, but (Probabilistically) Verify: Toward Tail Extraction. Should be a lot of fun!
 Toshiba We want to thank Toshiba for funding our work on knowledge base construction. We are very excited that Toshiba engineers are using DeepDive!
 ONR Thank you for funding my proposal Foundations for Datadriven Systems; Join Algorithms and Random Network Theory . The ONR continues to be one of the best supporters of pure theoretical work.
 DARPA. Thank you to DARPA's XData for supporting my collaborative work on scalable analytics and join processing.

Upcoming Meetings and Talks
 MSR. Faculty Summit. July 1420.
 Moore. Talk. July 2829.
 Stanford IoT Workshop. Aug 11.
 Modern Data Management Summit. Aug 2830
 VLDB. We'll present DimmWitted! Sep 15.
 DC (CCC and PI). Oct 1416.
 Dagstuhl. PlanBig. Dec. 1419.

Code
 DeepDive is available. Components have their own pages. Elementary. Gibbs Sampling on Factor Graphs on TBs in files, Accumulo, or HBase! Now with BUGS support! Tuffy is updated, which uses an RDBMS to process Markov Logic.
 Hogwild! SVMs, logistic regression, matrix factorization, and other convex goodness without locking. Specialized versions of tracenorm regularization called Jellyfish and nonnegative matrix factorization called HottTopix.
 Code for more projects are here and in MADlib, a product from Oracle, and in Cloudera's Impala.

Application Overview Videos (See our YouTube channel, HazyResearch)
 GeoDeepDive With Shanan Peters (UW Geoscience) and Miron Livny (Condor), we are combining Macrostrat with DeepDive to (hopefully!) deliver value for Geoscientists. One key challenge is extracting all the measurement information that is reported in the literature, that is buried in the dark data of text, graphs, and figures. A demo video and a new video about quality that is higher than the volunteers who have been at this for the last decade. This is all powered by DeepDive. Thank you to the National Science Foundation and Google for supporting this work.
 IceCube Mark Wellons, Ben Recht, and I have done some work with the IceCube Neutrino Detector. Mark's code now runs in the detector on the South Pole and is used on over 250 Million events per day. More details are in this video, this video, this paper at the The International Cosmic Ray Conference 2013, or this paper. Thank you to the IceCube Collaboration and UW Graduate School for their support of our work! and a most recent writeup accepted to NIM A and described here. IceCube (and Mark) got the cover of Science! Awesome!
 There are also videos about some of the technical portions of these projects Matrix Factorization, Seismic Data Interpolation, and a nowcasting framework (now called Ringtail).
 A messy, incomplete log of old updates is here.
Slides for EDBT/ICDT keynote on Joins and Convex Geometry