
DeepDive has been released, and more updates are coming soon! DeepDive is a generic probabilistic inference engine that uses a declarative language (SQL) to define factor graphs. DeepDive's most popular use case is knowledge base construction (KBC). Recently, some DeepDivebased KBC applications have exceeded the quality of humans in both precision and recall. DeepDive was recently mentioned in Forbes and as a top tool for data science.

New Results, Funding News, and Press
 Joins! New papers about one of my favorite topics, Joins. This is joint work with Hung Q Ngo and Atri Rudra.
 We have written a short survey for SIGMOD record about recent advances in worstcase optimal join algorithms. Our goal is to give a highlevel view of the worstcase optimality results for practitioners and applied researchers. We also managed to simplify the arguments.
 The Minesweeper paper is the first beyondworstcase analysis for any join algorithm (PODS14).
 Our Tetris paper describes (what we think is) a beautiful framework for beyond worst case and worstcase optimal algorithms for joins via a new connection between geometry and DPLL resolution.
 A full version of our join algorithm with worstcase optimal running time is here (original PODS 2012 paper).
 Thank you to Master's in Data Science for naming me a thought leader. It's humbling to be listed with such great people.
 Our paper with Michigan Econ (Shaprio and Levinson) and Michigan CS people (Antenucci and Cafarella) about using Twitter to predict economic indicators is out. It has been picked up by The Economist's blog, Wall Street Journal, the Boston Globe, The Washington Post, Patria (Czech), and Slate.
 We've just releasd a description of our sampling and inference engine to Arxiv DimmWitted: A Study of MainMemory Statistical Analytics. Led by Ji Liu and Stephen J. Wright, Victor Bittorf, Srikrishna Sridhar have some new theory about An Asynchronous Parallel Stochastic Coordinate Descent Algorithm in ICML.
 SIGMOD/PODS. Our papers are about joins and feature selection.
 PODS The Minesweeper paper is our attempt to go beyond worstcase analysis for join algorithms. We (with Dung Nguyen) develop a new algorithm that we call Minesweeper based on these ideas. The main theoretical idea is to formalize the amount of work any algorithm spends certifying (using a set of propositional statements) that the output set is complete (and not, say, a proper subset). We call this set of propositions the certificate. We manage to establish a dichotomy theorem for this stronger notion of complexity: if a query is what Ron Fagin calls betaacyclic, then Minesweeper runs in time linear in the certificate; if a query is betacyclic than on some instance any algorithm takes time that is super linear in the certificate. The results get sharper and more fun. Also, Dung is a superhero and has implemented a variant of this algorithm.
 SIGMOD14 Materialization Optimizations for Feature Selection shows that using a DSL and some novel optimizations, we can get order of magnitude performance wins for feature engineering in R.
 Google We want to thank Google for funding our research project, Trust, but (Probabilistically) Verify: Toward Tail Extraction. Should be a lot of fun!
 Toshiba We want to thank Toshiba for funding our work on knowledge base construction. We are very excited that Toshiba engineers are using DeepDive!
 ONR Thank you for funding my proposal Foundations for Datadriven Systems; Join Algorithms and Random Network Theory . The ONR continues to be one of the best supporters of pure theoretical work.
 DARPA. Thank you to DARPA's XData for supporting my collaborative work on scalable analytics and join processing.
 Joins! New papers about one of my favorite topics, Joins. This is joint work with Hung Q Ngo and Atri Rudra.

Upcoming Meetings and Talks
 Norcal Database Day. Apr. 23.
 EPFL. Research Day. A broad audience talk about DeepDive. June 12.
 SIGMOD/PODS. Ce is talking about feature selection, and I'll talk about beyond worstcase analysis for joins. June 2227.
 MSR. Faculty Summit. July 1416.

Code
 DeepDive is available. Components have their own pages. Elementary. Gibbs Sampling on Factor Graphs on TBs in files, Accumulo, or HBase! Now with BUGS support! Tuffy is updated, which uses an RDBMS to process Markov Logic.
 Hogwild! SVMs, logistic regression, matrix factorization, and other convex goodness without locking. Specialized versions of tracenorm regularization called Jellyfish and nonnegative matrix factorization called HottTopix.
 Code for more projects are here and in MADlib, a product from Oracle, and in Cloudera's Impala.

Application Overview Videos (See our YouTube channel, HazyResearch)
 GeoDeepDive With Shanan Peters (UW Geoscience) and Miron Livny (Condor), we are combining Macrostrat with DeepDive to (hopefully!) deliver value for Geoscientists. One key challenge is extracting all the measurement information that is reported in the literature, that is buried in the dark data of text, graphs, and figures. A demo video and a new video about quality that is higher than the volunteers who have been at this for the last decade. This is all powered by DeepDive. Thank you to the National Science Foundation and Google for supporting this work.
 IceCube Mark Wellons, Ben Recht, and I have done some work with the IceCube Neutrino Detector. Mark's code now runs in the detector on the South Pole and is used on over 250 Million events per day. More details are in this video, this video, this paper at the The International Cosmic Ray Conference 2013, or this paper. Thank you to the IceCube Collaboration and UW Graduate School for their support of our work! and a most recent writeup accepted to NIM A and described here. IceCube (and Mark) got the cover of Science! Awesome!
 There are also videos about some of the technical portions of these projects Matrix Factorization, Seismic Data Interpolation, and a nowcasting framework (now called Ringtail).
 A messy, incomplete log of old updates is here.
Slides for EDBT/ICDT keynote on Joins and Convex Geometry