Thanks for stopping by!

I'm a second-year PhD student in Computer Science at Stanford University, working with Percy Liang and Jure Leskovec. My research interests are in natural language processing and machine learning, in particular, representation learning for structured data, reasoning, and computation.

Previously, I worked with Dragomir Radev (LILY lab) and John Lafferty at Yale University.

My recent projects include text summarization (YZM+'17; YKZ+'19), semantic parsing (YZY+'18; YZE+'19), understanding programs and mathematics (YL'19; YL'20), and robustness of neural nets (YKR'18). I also co-organized Scientific Document Summarization workshop from SIGIR 2018 — this year it takes place as Scholarly Document Processing workshop @EMNLP 2020.



  • WILDS: A benchmark of in-the-wild distribution shifts
    Pang Wei Koh*, Shiori Sagawa*, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang
    arXiv 2020.   [paper] [project page] [code]
  • DrRepair: Graph-based, Self-Supervised Program Repair from Diagnostic Feedback
    Michihiro Yasunaga and Percy Liang
    ICML 2020.   [paper] [bib] [slides] [code & data] [codalab] [Stanford AI blog] [ 2020]


  • A Neural Topic-Attention Model for Medical Term Abbreviation Disambiguation
    Irene Li, Michihiro Yasunaga, Muhammed Yavuz Nuzumlalı, Cesar Caraballo, Shiwani Mahajan, Harlan Krumholz and Dragomir Radev
    NeurIPS 2019, Machine Learning for Health Workshop.   [paper] [bib] [code]
  • CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
    with Tao Yu, Rui Zhang, Caiming Xiong, Richard Socher, Walter Lasecki, Dragomir Radev and many authors.
    EMNLP 2019.   [paper] [bib] [slides] [dataset & leaderboard]
  • SParC: Cross-Domain Semantic Parsing in Context
    with Tao Yu, Rui Zhang, Caiming Xiong, Richard Socher, Dragomir Radev and many authors.
    ACL 2019.   [paper] [bib] [dataset & leaderboard]
  • TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts
    Michihiro Yasunaga and John Lafferty
    AAAI 2019.   [paper] [bib] [dataset (170MB)]
  • ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks
    Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander Fabbri, Irene Li, Dan Friedman and Dragomir Radev
    AAAI 2019.   [paper] [bib] [dataset]
  • Overview and Results of CL-SciSumm Shared Task 2019
    Muthu Kumar Chandrasekaran, Michihiro Yasunaga, Dragomir Radev, Dayne Freitag and Min-Yen Kan
    SIGIR 2019, BIRNDL Workshop.   [paper] [bib] [project page]


  • SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task
    Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li and Dragomir Radev
    EMNLP 2018.   [paper] [bib] [code]
  • Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
    Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang and Dragomir Radev
    EMNLP 2018.   [paper] [bib] [blog] [dataset & leaderboard]
  • Neural Coreference Resolution with Deep Biaffine Attention by Joint Mention Detection and Mention Clustering
    Rui Zhang, Cicero Nogueira dos Santos, Michihiro Yasunaga, Bing Xiang and Dragomir Radev
    ACL 2018.   [paper] [bib]
  • Robust Multilingual Part-of-Speech Tagging via Adversarial Training
    Michihiro Yasunaga, Jungo Kasai and Dragomir Radev
    NAACL 2018.   [paper] [bib] [slides] [code]


  • Graph-based Neural Multi-Document Summarization
    Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan and Dragomir Radev
    CoNLL 2017.   [paper] [bib]

Other Projects

  • Named Entity Recognition for Academic Advising
    Developed systems to recognize and link academic named entities to university database. Part of the Sapphire Project with University of Michigan and IBM Research.
  • Medical NLP
    Developed NLP technologies to analyze electronic health records (EHR). Collaboration with Yale School of Medicine.