Associate Professor of Computer Science and Statistics (courtesy)
Artificial Intelligence Lab
Natural Language Processing Group
Statistical Machine Learning Group
Gates 250 / firstname.lastname@example.org
My goal is to develop trustworthy systems that can communicate effectively with people and improve over time through interaction. I broadly identify with the machine learning (ICML, NeurIPS) and natural language processing (ACL, NAACL, EMNLP) communities.
Computers can do a lot, but tapping into their full power requires the rather non-trivial ability to program. I'm interested in building systems that learn to translate natural language descriptions (e.g., in English or Chinese) into programs (e.g., in Python or C++). Such systems would unlock the full power of computing to a much wider audience. A while back, I wrote a friendly introduction to natural language interfaces (XRDS magazine 2014) and a slightly more technical survey article on executable semantic parsing (CACM 2016). One idea we've explored is to "naturalize" a programming language gradually into a natural language (ACL 2017). One can also use natural language to describe classifiers directly rather than requiring labeled data (ACL 2018). The tension between the fuzziness of machine learning and the crispness of logic also fascinates me. On this note, we showed that neural networks can solve SAT problems with surprising accuracy despite not being told explicitly what a SAT problem is (ICLR 2019).
Despite the successes of machine learning, otherwise high-performing models are still difficult to debug and fail catastrophically in the presence of changing data distributions and adversaries. For example, on the SQuAD reading comprehension dataset we created (EMNLP 2016), we showed that state-of-the-art systems, despite reaching human-level benchmark performance, are easily fooled by distracting sentences in a way that no human would be (EMNLP 2017). Given society's increasing reliance on machine learning, it is critical to build tools to make machine learning more reliable in the wild. We've worked on using influence functions to understand black-box models (ICML 2017), semidefinite programming to provide certificates a neural network is safe from a class of adversaries (NeurIPS 2018), and distributionally robust optimization to ensure the fairness of machine learning models over time (ICML 2018).
Finally, I am a strong proponent of efficient and reproducible research. We have been developing CodaLab Worksheets, a platform that allows researchers to run and manage their experiments by maintaining the full provenance of an experiment from raw data to final results. Most of our recent papers have been published on CodaLab as executable papers. We are actively looking for contributors, so please contact me if you're interested!
Here is some code for older projects.