Yuhui Zhang

Department of Computer Science
Stanford University
Email: yuhuiz@cs.stanford.edu

Hi! I am a graduate student at Stanford University. My research interests span a wide range of topics in natural language processing, with a focus on representation learning, natural language generation and its real-world applications. Currently, I am working as a research assistant in StanfordNLP group and advised by Prof. Chris Manning.

Before that, I obtained a bachelor's degree with honours from the Department of Computer Science and Technology at Tsinghua University, and was a research assistant in the THUNLP Group. In 2018, I was very fortunate to closely collaborate with Prof. James Zou on improving automated diagnosis coding from EHRs.



Stanford University

Department of Computer Science

Master of Science, Sep. 2019 - Jun. 2021 (Expected)

Visiting Research Intern, Jun. 2018 - Sep. 2018

GPA: 4.30/4.00, Related Courses: CS229 Machine Learning (A+), CS230 Deep Learning (A+)

Tsinghua University

Department of Computer Science and Technology

Bachelor of Engineering, Aug. 2015 - Jul. 2019

Minor in Economics, Aug. 2016 - Jul. 2019

GPA: 3.86/4.00, Ranking 4/154, Related Courses: Thesis Project (A+), Student Research Training (A+), Introduction to Machine Learning (A), Artificial Neural Networks (A), Artificial Intelligence: Technology and Practice (A), Numerical Analysis (A), Linear Algebra (A-), Probability and Statistics (A-)

National Tsing Hua University

Department of Computer Science

Exchange Student, Jul. 2017 - Aug. 2017

Grades: 100/100


VetTag: improving automated veterinary diagnosis coding via large-scale language modeling. [PDF][BLOG]

Yuhui Zhang, Allen Nie, Ashley Zehnder, Rodney Page, James Zou.

Nature Digital Medicine (2019).

We extend DeepTag from four directions: from 42 coarse-grained diagnosis coding to 4,577 fine-grained coding, language modeling to utilize large-scale unlabeled EHRs, hierarchical training to address diagnosis hierarchy, and word visualization for interpretation.

Jiuge: A Human-Machine Collaborative Chinese Classical Poetry Generation System. [PDF][DEMO]

Zhipeng Guo, Xiaoyuan Yi, Maosong Sun, Wenhao Li, Cheng Yang, Jiannan Liang, Huimin Chen, Yuhui Zhang, Ruoyu Li.

Association for Computational Linguistics: System Demonstrations (2019).

Machine should not replace human in poem generation. We propose Jiuge, a human-machine collaborative Chinese poetry generation system, to allow constant and active user participation in poem creation.

Large-scale Generative Modeling to Improve Automated Veterinary Disease Coding. [PDF][POSTER]

Yuhui Zhang, Allen Nie, James Zou.

NeurIPS Machine Learning for Health Workshop (2018).

Massive veterinary EHRs remain unlabeled. We significantly improve diagnosis coding and cross-hospital generalization via utilizing these large-scale unlabeled EHRs.

DeepTag: inferring diagnoses from veterinary clinical notes. [PDF][PRESS]

Allen Nie, Ashley Zehnder, Rodney Page, Yuhui Zhang, A. Pineda, M. Rivas, C. Bustamante, James Zou.

Nature Digital Medicine (2018).

Manual coding is time-consuming and expensive. We develop large-scale algorithm to automatically predict standard diagnosis codes from EHRs and evaluate in challenging cross-hospital settings.

THUOCL: Tsinghua Open Chinese Lexicon. [LINK]

Shiyi Han, Yuhui Zhang, Yunshan Ma, Cunchao Tu, Zhipeng Guo, Zhiyuan Liu, Maosong Sun.

Technical Report (2016).

THUOCL is a set of high-quality Chinese lexicon and can be used to improve many Chinese NLP tasks.


Enhancing Transformer with Sememe Knowledge. [PDF]

Yuhui Zhang, Chenghao Yang, Zhengping Zhou, Zhiyuan Liu.


Transformer is always considered as a data-driven model. We introduce sememe (minimum semantic units by linguistic definition) knowledge into Transformer and demonstrate external linguistic knowledge can enhance the effectiveness and robustness of Transformer.

Inducing Grammar from Long Short-Term Memory Networks by Shapley Decomposition. [PDF]

Allen Nie, Yuhui Zhang.


As neural network has demonstrated surprising performance for natural language processing, curiosity about whether these models capture linguistic knowledge increases. We try to induce grammar by tracing the computational process of a long short-term memory network.

Course Projects

Evaluating the Factual Correctness for Abstractive Summarization. [PDF][POSTER]

Yuhui Zhang, Yuhao Zhang, Christopher D Manning

CS230 Deep Learning (2019 Fall).

30% of summaries generated by abstractive models contain factual inconsistencies. We propose the factual score, a new evaluation metric to evaluate the factual correctness for neural abstractive summarization.

Improving Neural Abstractive Summarization via Reinforcement Learning with BERTScore. [PDF][POSTER]

Yuhui Zhang, Ruocheng Wang, Zhengping Zhou

CS229 Machine Learning (2019 Fall).

As BERTScore is demonstrated to be a better evaluation metric for natural language generation, we use BERTScore as reward function to improve neural abstractive summarization via reinforcement learning.

Enhancing Knowledge-based Question Answering with Supporting Sentences. [PDF]

Zhengping Zhou, Yuhui Zhang.

Artificial Neural Networks (2017 Fall).

Structured knowledge base and unstructured free text are two primary resources for answering questions. We explore using natural language supporting sentences generated from Wikipedia to improve the performance of KBQA system.

Hall of Fame: Inferring Political Attitudes Using Weibo Data

Yuhui Zhang.

The micro-blogging service Weibo has become one of the most important communication areas in China. We explore recommendation algorithms and matrix factorization in social network analysis.



I enjoy reading a wide range of books. My favorite books: To Live (Hua Yu), Walden (Henry David Thoreau), Principles of Economics (N. Gregory Mankiw). I enjoy running and swimming in the evening. I love classical music, and I learned to play the guitar, piano, and pipa at Tsinghua University.

Last Update: Jan 3, 2020