Yuhui Zhang

Department of Computer Science
Stanford University
Email: yuhuiz@cs.stanford.edu

Hi! I am a graduate student at Stanford University. My research interests span a wide range of topics in natural language processing, with a focus on representation learning, natural language generation and its real-world applications. Currently, I am working as a research assistant in StanfordNLP group and advised by Prof. Chris Manning.

Before that, I obtained a bachelor's degree with honours from the Department of Computer Science and Technology at Tsinghua University, and was a research assistant in the THUNLP Group. In 2018, I was very fortunate to closely collaborate with Prof. James Zou on improving automated diagnosis coding from EHRs.



Stanford University

Department of Computer Science

09/2019-06/2021, Master of Science, GPA: 4.30/4.00

06/2018-09/2018, Visiting Research Intern

Tsinghua University

Department of Computer Science and Technology

08/2015-07/2019, Bachelor of Engineering, GPA: 3.86/4.00, Ranking 4/154

National Tsing Hua University

Department of Computer Science

07/2017 - 08/2017, Exchange Student, Grades: 100/100


Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. [PDF][DOC]

Peng Qi*, Yuhao Zhang*, Yuhui Zhang, Jason Bolton, Christopher D. Manning.

ACL: System Demonstrations (2020).

The Stanford NLP Group's official Python NLP library. It contains support for running various accurate natural language processing tools on 60+ languages and for accessing the Java Stanford CoreNLP software from Python.

VetTag: improving automated veterinary diagnosis coding via large-scale language modeling. [PDF][BLOG]

Yuhui Zhang*, Allen Nie*, Ashley Zehnder, Rodney Page, James Zou.

Nature Digital Medicine (2019).

We extend DeepTag from four directions: from 42 coarse-grained diagnosis coding to 4,577 fine-grained coding, language modeling to utilize large-scale unlabeled EHRs, hierarchical training to address diagnosis hierarchy, and word visualization for interpretation.

Jiuge: A Human-Machine Collaborative Chinese Classical Poetry Generation System. [PDF][DEMO]

Zhipeng Guo*, Xiaoyuan Yi*, Maosong Sun, Wenhao Li, Cheng Yang, Jiannan Liang, Huimin Chen, Yuhui Zhang, Ruoyu Li.

ACL: System Demonstrations (2019).

Machine should not replace human in poem generation. We propose Jiuge, a human-machine collaborative Chinese poetry generation system, to allow constant and active user participation in poem creation.

Large-scale Generative Modeling to Improve Automated Veterinary Disease Coding. [PDF][POSTER]

Yuhui Zhang*, Allen Nie*, James Zou.

NeurIPS: Machine Learning for Health Workshop (2018).

Massive veterinary EHRs remain unlabeled. We significantly improve diagnosis coding and cross-hospital generalization via utilizing these large-scale unlabeled EHRs.

DeepTag: inferring diagnoses from veterinary clinical notes. [PDF][PRESS]

Allen Nie*, Ashley Zehnder*, Rodney Page, Yuhui Zhang, A. Pineda, M. Rivas, C. Bustamante, James Zou.

Nature Digital Medicine (2018).

Manual coding is time-consuming and expensive. We develop large-scale algorithm to automatically predict standard diagnosis codes from EHRs and evaluate in challenging cross-hospital settings.

THUOCL: Tsinghua Open Chinese Lexicon. [LINK]

Shiyi Han*, Yuhui Zhang*, Yunshan Ma, Cunchao Tu, Zhipeng Guo, Zhiyuan Liu, Maosong Sun.

Technical Report (2016).

THUOCL is a set of high-quality Chinese lexicon and can be used to improve many Chinese NLP tasks.


I enjoy reading a wide range of books. My favorite books: To Live (Hua Yu), Walden (Henry David Thoreau), Principles of Economics (N. Gregory Mankiw). I enjoy running and swimming in the evening. I love classical music, and I learned to play the guitar, piano, and pipa at Tsinghua University.

