Thanks for stopping by!

I'm a first-year PhD student in Computer Science at Stanford University. I work on natural language processing and machine learning. In particular, I am interested in language as a window into knowledge, reasoning, and computation.

Previously, I worked with Dragomir Radev (@ LILY lab) and John Lafferty at Yale University.

My recent projects include text summarization (YZM+'17; YKZ+'19), semantic parsing (YZY+'18; YZE+'19), mathematical writing (YL'19), and robustness of neural nets (YKR'18). I also co-organized Scientific Document Summarization workshop at SIGIR 2018-19 — which is expanded to Scholarly Document Processing workshop @EMNLP 2020.



  • A Neural Topic-Attention Model for Medical Term Abbreviation Disambiguation
    Irene Li, Michihiro Yasunaga, Muhammed Yavuz Nuzumlalı, Cesar Caraballo, Shiwani Mahajan, Harlan Krumholz and Dragomir Radev
    NeurIPS 2019, Machine Learning for Health Workshop.   [paper] [bib] [code]
  • CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
    with Tao Yu, Rui Zhang, Caiming Xiong, Richard Socher, Walter Lasecki, Dragomir Radev and many authors.
    EMNLP 2019.   [paper] [bib] [slides] [dataset & leaderboard]
  • SParC: Cross-Domain Semantic Parsing in Context
    with Tao Yu, Rui Zhang, Caiming Xiong, Richard Socher, Dragomir Radev and many authors.
    ACL 2019.   [paper] [bib] [dataset & leaderboard]
  • TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts
    Michihiro Yasunaga and John Lafferty
    AAAI 2019.   [paper] [bib] [dataset (170MB)]
  • ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks
    Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander Fabbri, Irene Li, Dan Friedman and Dragomir Radev
    AAAI 2019.   [paper] [bib] [dataset]
  • Overview and Results of CL-SciSumm Shared Task 2019
    Muthu Kumar Chandrasekaran, Michihiro Yasunaga, Dragomir Radev, Dayne Freitag and Min-Yen Kan
    SIGIR 2019, BIRNDL Workshop.   [paper] [bib] [project page]


  • SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task
    Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li and Dragomir Radev
    EMNLP 2018.   [paper] [bib] [code]
  • Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
    Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang and Dragomir Radev
    EMNLP 2018.   [paper] [bib] [blog] [dataset & leaderboard]
  • Neural Coreference Resolution with Deep Biaffine Attention by Joint Mention Detection and Mention Clustering
    Rui Zhang, Cicero Nogueira dos Santos, Michihiro Yasunaga, Bing Xiang and Dragomir Radev
    ACL 2018.   [paper] [bib]
  • Robust Multilingual Part-of-Speech Tagging via Adversarial Training
    Michihiro Yasunaga, Jungo Kasai and Dragomir Radev
    NAACL 2018.   [paper] [bib] [slides] [code]


  • Graph-based Neural Multi-Document Summarization
    Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan and Dragomir Radev
    CoNLL 2017.   [paper] [bib]

Other Projects

  • Named Entity Recognition for Academic Advising
    Developed systems to recognize and link academic named entities to university database. Part of the Sapphire Project with University of Michigan and IBM Research.
  • Medical NLP
    Developed NLP technologies to analyze electronic health records (EHR). Collaboration with Yale School of Medicine.