
DeepDive has been released, and more updates are coming soon! DeepDive is a generic probabilistic inference engine that uses a declarative language (SQL) to define factor graphs. DeepDive's most popular use case is knowledge base construction (KBC). Recently, some DeepDivebased KBC applications have exceeded the quality of humans in both precision and recall.

New Results and Press
 Manuscripts
 Resolution framework for beyondworstcase joins here
 (Nearly) Global Convergence of SGD for NonConvex Matrix Problems preliminary version.
 Increasing the parallelism in multiround MapReduce join plans.
 Exploting Correlations for Expensive Predicate Evaluation here.
 I'm honored to be selected as a Moore DataDriven Discovery Investigator. What an exciting list of people! Thank you to Context Relevant, the world's best analytics company for Wall St. and beyond, for putting out such a nice press release (CNBC, Yahoo! Finance). My dad loved it.
 Our paper with Michigan Econ (Shaprio and Levinson) and Michigan CS people (Antenucci and Cafarella) about using Twitter to predict economic indicators is out. It has been picked up by The Economist's blog, Wall Street Journal, the Boston Globe, The Washington Post, Patria (Czech), and Slate. A summary of this work has been selected to appear in the NBER digest in the August issue. Interesting follow ups in Science, excited to see where it goes!
 PaleoDB. Our assessment of PaleoDeepDive is here!. On some extraction tasks, PaleoDeepDive exceeds human volunteer quality in both precision and recall. Thank you to PaleoDB and Shanan Peters for all their painstaking work! A description of our approach to building KBC systems is here. It's all about feature engineering! DeepDive was recently mentioned in Forbes and as a top tool for data science.
 Recent papers are about analytics, joins and feature selection. It all ends up in DeepDive...
 HighPerformance Analytics. Thank you to Microsoft for giving a nod to Hogwild!; they mentioned that their Adam system (for machine learning) is based on our Hogwild! approach. We love that they got the exclamation point into print, although it's dubious that anyone is more Hogwild! than we are! The succesors for both the theory (ICML14) and systems work (VLDB14) are in print. The recent DimmWitted engine is a deeper systems exploration of the tradeoff space, check it out!
 Feature Engineering. Materialization Optimizations for Feature Selection shows that using a DSL and some novel optimizations, we can get order of magnitude performance wins for feature engineering in R. Thank you to SIGMOD for selecting this as the best paper at SIGMOD14! In NIPS14, Yingbo Zhou and (many) others have some very nice work about how to do feature selection in parallel using some group testing ideas. Most recently, a draft about the DeepDive approach to feature engineering for KBC systems is here.
 Joins! New papers about one of my favorite topics, Joins. This is joint work with Hung Q Ngo and Atri Rudra.
 We have written a short overview for SIGMOD record about recent advances in worstcase optimal join algorithms. Our goal is to give a highlevel view of the worstcase optimality results for practitioners and applied researchers. We also managed to simplify the arguments.
 The Minesweeper paper is our attempt to go beyond worstcase analysis for join algorithms. We (Hung Ngo, Dung Ngyuen, Atri Rudra, and I) develop a new algorithm that we call Minesweeper. The main idea is to formalize the amount of work any algorithm spends certifying (using a set of propositional statements) that the output set is complete (and not, say, a proper subset). We call this set of propositions the certificate. We manage to establish a dichotomy theorem for this stronger notion of complexity: if a query's hypergraph is what Ron Fagin calls betaacyclic, then Minesweeper runs in time linear in the certificate; if a query is betacyclic than on some instance any algorithm takes time that is super linear in the certificate. The results get sharper and more fun. Also, Dung is a superhero and has implemented a variant of this algorithm.
 Our Tetris paper describes (what we think is) a beautiful framework for beyond worst case and worstcase optimal algorithms for joins via a new connection between geometry and DPLL resolution.
 A full version of our join algorithm with worstcase optimal running time is here (original PODS 2012 paper).
 Manuscripts

Upcoming Meetings and Talks
 Seattle and Allen AI2 Distinguished Lecture. Nov 1314.
 Panel on Data Science Education. Dec 4.
 NIPS and NIPS Workshop. Automatic Knowledge Base Construction. Dec. 1213.
 DeepTime Data with AGU. Dec. 14. (Matteo will be there!)
 Dagstuhl. PlanBig. Dec. 1419.
 CIDR. Jan 47.
 Strata. Feb. 1113.

Code
 DeepDive is available. Components have their own pages. Elementary. Gibbs Sampling on Factor Graphs on TBs in files, Accumulo, or HBase! Now with BUGS support! Tuffy is updated, which uses an RDBMS to process Markov Logic.
 Hogwild! SVMs, logistic regression, matrix factorization, and other convex goodness without locking. Specialized versions of tracenorm regularization called Jellyfish and nonnegative matrix factorization called HottTopix.
 Code for more projects are here and in MADlib, a product from Oracle, and in Cloudera's Impala.

Application Overview Videos (See our YouTube channel, HazyResearch)
 GeoDeepDive With Shanan Peters (UW Geoscience) and Miron Livny (Condor), we are combining Macrostrat with DeepDive to (hopefully!) deliver value for Geoscientists. One key challenge is extracting all the measurement information that is reported in the literature, that is buried in the dark data of text, graphs, and figures. A demo video and a new video about quality that is higher than the volunteers who have been at this for the last decade. This is all powered by DeepDive. Thank you to the National Science Foundation and Google for supporting this work.
 IceCube Mark Wellons, Ben Recht, and I have done some work with the IceCube Neutrino Detector. Mark's code now runs in the detector on the South Pole and is used on over 250 Million events per day. More details are in this video, this video, this paper at the The International Cosmic Ray Conference 2013, or this paper. Thank you to the IceCube Collaboration and UW Graduate School for their support of our work! and a most recent writeup accepted to NIM A and described here. IceCube (and Mark) got the cover of Science! Awesome!
 There are also videos about some of the technical portions of these projects Matrix Factorization, Seismic Data Interpolation, and a nowcasting framework (now called Ringtail).
 A messy, incomplete log of old updates is here.
Slides for EDBT/ICDT keynote on Joins and Convex Geometry