I am a PhD student at Stanford. I'm advised by Prof. Dan Jurafsky. I've also worked with Alon Halevy, Prof. Chris Re and Prof. Keith Winstein.
I am interested in all things AI, especially natural language processing and its applications in data systems.
My vision is to make all knowledge available to people ... not at their fingertips, but to the tips of their toungues.
Currently, I'm looking at how we can automatically detect symptoms of Schizophrenia with methods from computational linguistics.
I'm also interested in adding structure to unstructured data. In particular, I'm working on a system called FrameIt that makes it easy and fast to explore large text corpora by defining frames and quickly training SRLs to extract information.
In a past life, I was a systems hacker. I still like playing around with Amazon lambda functions and other serverless computing paradigms. I've worked on a new parallel compiler called GG that parallelizes massive builds and runs them on lambda functions.
Palo Alto, CA 94306 US
my full name AT stanford.edu
Schizophrenia is a mental disorder which afflicts an estimated 0.7% of adults worldwide (Saha et al., 2005). It affects many areas of mental function, often evident from incoherent speech. Diagnosing schizophrenia relies on subjective judgments resulting in disagreements even among trained clinicians. Recent studies have proposed the use of natural language processing for diagnosis by drawing on automatically-extracted linguistic features, and particularly the use of discourse coherence. Here, we present the first benchmark comparison of previously proposed coherence models for detecting symptoms of schizophrenia and evaluate their performance on a new dataset of recorded interviews between subjects and clinicians. We also present two improved coherence metrics based on modern sentence embedding techniques that outperform the previous methods on our dataset. Finally, we propose a novel computational model for reference incoherence based on ambiguous pronoun usage and show that it is a highly predictive feature on our data. While the number of subjects is limited in this pilot study, our results suggest new directions for diagnosing common symptoms of schizophrenia.CLPsych Workshop paper - NAACL 2018
Modern machine learning techniques, such as deep learning, often use discriminative models that require large amounts of labeled data. An alternative approach is to use a generative model, which leverages heuristics from domain experts to train on unlabeled data. Domain experts often prefer to use generative models because they "tell a story" about their data. Unfortunately, generative models are typically less accurate than discriminative models. Several recent approaches combine both types of model to exploit their strengths. In this setting, a misspecified generative model can hurt the performance of subsequent discriminative training. To address this issue, we propose a framework called Socratic learning that automatically uses information from the discriminative model to correct generative model misspecification. Furthermore, this process provides users with interpretable feedback about how to improve their generative model. We evaluate Socratic learning on real-world relation extraction tasks and observe an immediate improvement in classification accuracy that could otherwise require several weeks of effort by domain experts.Workshop paper
A blog post about how to implement data programming in TensorFlowBlog post
We study the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs. We first focus on the single-node setting and show that by using standard batching and data-parallel techniques, throughput can be improved by at least 5.5x over state-of-the-art systems on CPUs. This ensures an end-to-end training speed directly proportional to the throughput of a device regardless of its underlying hardware, allowing each node in the cluster to be treated as a black box. Our second contribution is a theoretical and empirical study of the tradeoffs affecting end-to-end training time in a multiple-device setting. We identify the degree of asynchronous parallelization as a key factor affecting both hardware and statistical efficiency. We see that asynchrony can be viewed as introducing a momentum term. Our results imply that tuning momentum is critical in asynchronous parallel configurations, and suggest that published results that have not been fully tuned might report suboptimal performance for some configurations. For our third contribution, we use our novel understanding of the interaction between system and optimization dynamics to provide an efficient hyperparameter optimizer. Our optimizer involves a predictive model for the total time to convergence and selects an allocation of resources to minimize that time. We demonstrate that the most popular distributed deep learning systems fall within our tradeoff space, but do not optimize within the space. By doing this optimization, our prototype runs 1.9x to 12x faster than the fastest state-of-the-art systems.Arxiv paper
A blog post describing Socratic learning.Blog post
In collaboration with Prof. Bailis at Stanford University we are trying to design the future
of mobile sensor systems. Here's a preview:
We have seen a massive proliferation of autonomous, mobile sensing agents in recent years, and this growth promises to continue into the next decade. With the rapid commoditization of these devices, autonomous sensors will likely become the next major platform for big data analytics and application development. In this paper, we introduce EAGLE, the first exploration of building control into a modern data system. EAGLE provides a platform for users to query the physical world through virtualized sensing, abstracting away an underlying network of data-collecting agents. EAGLE also explores the challenges of building a platform for application development on fleets of mobile sensors. As a prototype of our proposed system, we implemented a mobile sensing agent composed of a programmable Roomba vacuum cleaner and a video-enable smartphone. The robot is able to move autonomously, collect data that it aggregates to a central server, and respond to commands from a high level user interface.
Tracking an unknown number of targets given noisy measurements from multiple sensors is critical to autonomous driving. Rao-Blackwellized particle filtering is well suited to this problem. Monte Carlo sampling is used to determine whether measurements are valid, and if so, which targets they originate from. This breaks the problem into single target tracking sub-problems that are solved in closed form (e.g. with Kalman filtering). We compare the performance of a traditional Kalman filter with that of a recurrent neural network for single target tracking. We show that LSTMs outperform Kalman filtering for single target prediction by 2x. We also present a unique model for training two dependent LSTMs to output a Gaussian distribution for a single target prediction to be used as input to multi-target tracking. We evaluate the end to end performance of an LSTM and a Kalman filter for simultaneous multiple target tracking. In the end to end pipeline, LSTMs do not provide a significant improvement.Photography
Using various machine learning tactics to automatically detect shapes in physical chromosome strutures.Academic projects
User-contributed Web data contains rich and diverse information about a variety of events in the physical world, such as shows, festivals, conferences and more. This information ranges from known event features (e.g., title, time, location) posted on event aggregation platforms (e.g., Last.fm events, EventBrite, Facebook events) to discussions and reactions related to events shared on different social media sites (e.g., Twitter, YouTube, Flickr). In this paper, we focus on the challenge of automatically identifying user-contributed content for events that are planned and, therefore, known in advance, across different social media sites. We mine event aggregation platforms to extract event features, which are often noisy or missing. We use these features to develop query formulation strategies for retrieving content associated with an event on different social media sites. Further, we explore ways in which event content identified on one social media site can be used to retrieve additional relevant event content on other social media sites. We apply our strategies to a large set of user-contributed events, and analyze their effectiveness in retrieving relevant event content from Twitter, YouTube, and Flickr.Undergraduate research
As machine learning methods gain popularity across dierent elds, acquiring labeled training datasets has become the primary boleneck in the machine learning pipeline. Recently, generative models have been used to create and label large amounts of training data, albeit noisily. e output of these generative models is then used to train a discriminative model of choice, such as logistic regression or a complex neural network. However, any errors in the generative model can propagate to the subsequent model being trained. Unfortunately, these generative models are not easily interpretable and are therefore dicult to debug for users. To address this, we present our vision for Flipper, a framework that presents users with high-level information about why their training set is inaccurate and informs their decisions as they improve their generative model manually. We present potential tools within the Flipper framework, inspired by observing biomedical experts working with generative models, which allow users to analyze the errors in their training data in a systematic fashion. Finally, we discuss a prototype of Flipper and report results of a user study where users create a training set for a classication task and improve the discriminative model’s accuracy by 2.4 points in less than an hour with feedback from Flipper.Photography
With the proliferation of natural language interfaces on mobile devices and in home personal assistants such as Siri and Alexa, many services and data are becoming available through transcription from a speech recognition system. One major risk factor in this trend is that a malicious adversary may attack this system without the primary user noticing. One way to accomplish this is to use adversarial examples that are perceived one way by a human, but transcribed differently by the Automatic Speech Recognition (ASR) system. For example, a recording that sounds like ”hello” to the human ear, but is transcribed as “goodbye” by the ASR system. Recent work has shown that adversarial examples can be created for convolutional neural networks to fool vision recognition systems. We show that similar methods can be applied to neural ASR systems. We show successful results for two methods of generating adversarial examples where we fool a high quality ASR system but the difference in the audio is imperceptible to the human ear. We also present a method for converting the adversarial MFCC features back into audio.Photography
Tracking an unknown number of targets given noisy measurements from multiple sensors is critical to autonomous driving. RaoBlackwellized particle filtering is well suited to this problem. Monte Carlo sampling is used to determine whether measurements are valid, and if so, which targets they originate from. This breaks the problem into single target tracking sub-problems that are solved in closed form (e.g. with Kalman filtering). We compare the performance of a traditional Kalman filter with that of a recurrent neural network for single target tracking. We show that LSTMs outperform Kalman filtering for single target prediction by 2x. We also present a unique model for training two dependent LSTMs to output a Gaussian distribution for a single target prediction to be used as input to multi-target tracking. We evaluate the end to end performance of an LSTM and a Kalman filter for simultaneous multiple target tracking. In the end to end pipeline, LSTMs do not provide a significant improvement.Photography
Deep neural networks are in vogue for text classification. The lack of interpretability and computational cost associated with deep architectures has led to renewed interest in effective baseline models. In this paper, we review several popular baseline models which strike a balance between traditional and neural approaches, and propose improvements by combining their key contributions. In particular, we study gradient-tuned word embeddings, modeling n-grams, and generative sentence representation methods. We evaluate our methods by comparing end performance and training time on sentiment analysis and topic classification tasks. By combining techniques of popular baseline models into a single shallow architecture, we outperformed the individual models on all tasks, were competitive with traditional and deep approaches, and maintained fast training times.Photography
PhD in Computer Science • Present
I am focusing in artificial intelligence and natural language processing. I'm advised by Prof. Dan Jurafsky. I've worked on deep neural network interpretability, distant supervision, localization and mapping, IoT, autonomous driving and object tracking.
B.S. in Computer Science • May 2011
I graduated Summa Cum Laude. I did research in information retrieval in social data with Prof. Luis Gravano and Hila Becker. My focus was in systems. I was also a member of Bacchanal.
Reserach Intern • June 2017 - August 2017
FrameIt: A system to quickly build framings and SRLs for exploring large text corpora. In progress of building an ontology of happy moments in HappyDB.
Software Engineer (R&D) • June 2015 - December 2015
I helped design a new highly parallel processor architecture that offers GPU performance on an x86 instruction set. I profiled our design on a number of common machine working workloads.
Software Engineer • June 20012 - May 2015
As the 9th member of this startup I was involved from our first prototype to our second major product. I worked across the stack including implemented kernel drivers for high performance distributed caching for datacenters, MVC business logic for system managament and even built our light weight cross platform installer.
My main skills as a computer scientist and software engineer are systems and machine learning. I've build large scale high performance and high availability systems in industry. I've implemented deep learning systems from scratch and optimized existing systems. I've also built novel pipelines modeling training data creation with generative models.
Our opponents maintain that we are confronted with insurmountable ... obstacles, but that may be said of the smallest obstacle if one has no desire to surmount it.Theodor Herzl