Hi! I am a graduate student at Stanford University. My research interests span a wide range of topics in natural language processing, with a focus on representation learning, natural language generation and its real-world applications. Currently, I am working as a research assistant in StanfordNLP group and advised by Prof. Chris Manning.
Before that, I obtained a bachelor's degree with honours from the Department of Computer Science and Technology at Tsinghua University, and was a research assistant in the THUNLP Group. In 2018, I was very fortunate to closely collaborate with Prof. James Zou on improving automated diagnosis coding from EHRs.
- 05/2019: Selected as the best oral presentation at 36th Tsinghua CS Forum for Graduate Students!
- 04/2019: How to infer thousands of diagnoses from EHRs? Check our paper in npj (Nature) Digital Medicine!
- 12/2018: Awarded the SenseTime Scholarship (USD 3,000). Thanks SenseTime Inc.!
- 10/2018: Awarded highly selective National Scholarship!
- 06/2018: Received Tsinghua Research Fellowship with a funding of 7,500 USD!
Department of Computer Science
Master of Science, Sep. 2019 - Jun. 2021 (Expected)
Visiting Research Intern, Jun. 2018 - Sep. 2018
GPA: 4.30/4.00, Related Courses: CS229 Machine Learning (A+), CS230 Deep Learning (A+)
Department of Computer Science and Technology
Bachelor of Engineering, Aug. 2015 - Jul. 2019
Minor in Economics, Aug. 2016 - Jul. 2019
GPA: 3.86/4.00, Ranking 4/154, Related Courses: Thesis Project (A+), Student Research Training (A+), Introduction to Machine Learning (A), Artificial Neural Networks (A), Artificial Intelligence: Technology and Practice (A), Numerical Analysis (A), Linear Algebra (A-), Probability and Statistics (A-)
National Tsing Hua University
Department of Computer Science
Exchange Student, Jul. 2017 - Aug. 2017
VetTag: improving automated veterinary diagnosis coding via large-scale language modeling. [PDF][BLOG]
Yuhui Zhang, Allen Nie, Ashley Zehnder, Rodney Page, James Zou.
Nature Digital Medicine (2019).
We extend DeepTag from four directions: from 42 coarse-grained diagnosis coding to 4,577 fine-grained coding, language modeling to utilize large-scale unlabeled EHRs, hierarchical training to address diagnosis hierarchy, and word visualization for interpretation.
Zhipeng Guo, Xiaoyuan Yi, Maosong Sun, Wenhao Li, Cheng Yang, Jiannan Liang, Huimin Chen, Yuhui Zhang, Ruoyu Li.
Association for Computational Linguistics: System Demonstrations (2019).
Machine should not replace human in poem generation. We propose Jiuge, a human-machine collaborative Chinese poetry generation system, to allow constant and active user participation in poem creation.
Yuhui Zhang, Allen Nie, James Zou.
NeurIPS Machine Learning for Health Workshop (2018).
Massive veterinary EHRs remain unlabeled. We significantly improve diagnosis coding and cross-hospital generalization via utilizing these large-scale unlabeled EHRs.
Allen Nie, Ashley Zehnder, Rodney Page, Yuhui Zhang, A. Pineda, M. Rivas, C. Bustamante, James Zou.
Nature Digital Medicine (2018).
Manual coding is time-consuming and expensive. We develop large-scale algorithm to automatically predict standard diagnosis codes from EHRs and evaluate in challenging cross-hospital settings.
THUOCL: Tsinghua Open Chinese Lexicon. [LINK]
Shiyi Han, Yuhui Zhang, Yunshan Ma, Cunchao Tu, Zhipeng Guo, Zhiyuan Liu, Maosong Sun.
Technical Report (2016).
THUOCL is a set of high-quality Chinese lexicon and can be used to improve many Chinese NLP tasks.
Enhancing Transformer with Sememe Knowledge. [PDF]
Yuhui Zhang, Chenghao Yang, Zhengping Zhou, Zhiyuan Liu.
Transformer is always considered as a data-driven model. We introduce sememe (minimum semantic units by linguistic definition) knowledge into Transformer and demonstrate external linguistic knowledge can enhance the effectiveness and robustness of Transformer.
Inducing Grammar from Long Short-Term Memory Networks by Shapley Decomposition. [PDF]
Allen Nie, Yuhui Zhang.
As neural network has demonstrated surprising performance for natural language processing, curiosity about whether these models capture linguistic knowledge increases. We try to induce grammar by tracing the computational process of a long short-term memory network.
Yuhui Zhang, Yuhao Zhang, Christopher D Manning
CS230 Deep Learning (2019 Fall).
30% of summaries generated by abstractive models contain factual inconsistencies. We propose the factual score, a new evaluation metric to evaluate the factual correctness for neural abstractive summarization.
Yuhui Zhang, Ruocheng Wang, Zhengping Zhou
CS229 Machine Learning (2019 Fall).
As BERTScore is demonstrated to be a better evaluation metric for natural language generation, we use BERTScore as reward function to improve neural abstractive summarization via reinforcement learning.
Enhancing Knowledge-based Question Answering with Supporting Sentences. [PDF]
Zhengping Zhou, Yuhui Zhang.
Artificial Neural Networks (2017 Fall).
Structured knowledge base and unstructured free text are two primary resources for answering questions. We explore using natural language supporting sentences generated from Wikipedia to improve the performance of KBQA system.
Hall of Fame: Inferring Political Attitudes Using Weibo Data
The micro-blogging service Weibo has become one of the most important communication areas in China. We explore recommendation algorithms and matrix factorization in social network analysis.
- 2019 Best Oral Presentation Award (Presented VetTag at 36th Tsinghua CS Graduate Forum) [slides]
- 2019 Research Career Award (Awarded from Tsinghua CS)
- 2018 National Scholarship (Top 0.2% in China, Highest Honor for Undergraduate)
- 2018 SenseTime Scholarship for AI Research (Top 30 in China)
- 2018 Qualcomm Scholarship for Research (Top 33/3300 at Tsinghua)
- 2018 Tsinghua Research Fellowship (Top 50/3300 at Tsinghua)
- 2018 Comprehensive Performance Scholarship (Top 8/153 in Dept. of CS)
- 2018 Social Practice Scholarship (Top 1/153 in Dept. of CS)
- 2017 China Scholarship Council Undergraduate Fellowship (Top 4500 in China)
- 2017 Comprehensive Performance Scholarship (Top 8/153 in Dept. of CS)
- 2016 Academic Performance Scholarship (Top 15/153 in Dept. of CS)
- 2016 Social Practice Scholarship (Top 2/153 in Dept. of CS)
- 2015 Freshman Scholarship (Top 150/3300 at Tsinghua)
- 2014 National Chemistry Olympiad Finals 1st Prize (Top 0.1% in China)
I enjoy reading a wide range of books. My favorite books: To Live (Hua Yu), Walden (Henry David Thoreau), Principles of Economics (N. Gregory Mankiw). I enjoy running and swimming in the evening. I love classical music, and I learned to play the guitar, piano, and pipa at Tsinghua University.