I am a 4th year PhD student at Stanford University, Computer Science. I am fortunate to be advised by Prof. Christopher Manning and am a member of the NLP group. My research focuses on reasoning, compositionality, and representation learning, at the intersection of vision and language.
I explore structural principles and inductive biases for making neural networks more interpretable, robust and data-efficient, and allow them to generalize effectively and systematically from a few samples only. I believe in the importance of multi-disciplinary both within the AI field and across domains, and draw high-level inspiration from the feats of the human mind, including its structural properties as well as cognitive capabilities.
I believe that compositionality is a key ingredient that, if incorporated successfully into neural models, may help bridging the gap between machine intelligence and natural intelligence. I explore ways to achieve compositionality both in terms of computation and representation.
Towards the former, I introduced, together with my advisor, models such as MAC and the Neural State Machine that perform transparent step-by-step reasoning, as well as the GQA dataset for real-world visual question answering.
Towards the latter, I began more recently to explore ways to learn compositional scene representations, and along with my research collaborator from Facebook AI Research, presented the Generative Adversarial Transformers, for fast, data-efficient and high-resolution image synthesis. I am actively researching this subject further and hope to present new findings on this exciting direction in the near future!