Assistant Professor of Computer Science and Statistics (courtesy)
Artificial Intelligence Lab
Natural Language Processing Group
My two aims are (i) to build systems that allow humans to communicate with computers and (ii) to develop algorithms that can infer latent structures from raw data. I broadly identify with the machine learning (ICML, NIPS) and natural language processing (ACL, NAACL, EMNLP) communities.
Regarding (i), think of a sentence (e.g., "What fraction of CO2 emissions is from the top 5 countries?") as encoding a computer program, which when executed results in some action (e.g., querying a database to compute the answer). Check out this friendly introduction to natural language interfaces (XRDS magazine), a more general tutorial on natural language understanding (ICML 2015), or this more linguistics-oriented article. I also wrote a CACM survey article on executable semantic parsing. This provides the technical foundation of SEMPRE, a general toolkit we built for semantic parsing, which we have used for a number of projects (EMNLP 2013, ACL 2014, ACL 2015, ACL 2015).
Regarding (ii), we wish to learn a system that can perform complex tasks such as answering questions or translating foreign sentences into English. Can we automatically induce latent structures (e.g., programs, alignments, vectors) that aid the prediction task? This setting is challenging because it typically results in non-convex optimization problems over parameters and combinatorial search problems over latent structures. For the former, we have developed method of moments algorithms (e.g., ICML 2014, NIPS 2012) that yield strong global theoretical guarantees, avoiding local optima entirely. For the latter, we have developed learning algorithms that model the search process, thus taking prediction cost into account (e.g., ICML 2015, AISTATS 2015).
Finally, I am a strong proponent of efficient and reproducible research. I am developing CodaLab Worksheets in collaboration with Microsoft Research, a new platform that allows researchers to maintain the full provenance of an experiment from raw data to final results. Most of our recent papers have been published on CodaLab as executable papers. We are actively looking for contributors, so please contact me if you're interested!