Yuhui Zhang's Homepage

Hi! I am a postdoctoral scholar at Stanford University, working with Professors Serena Yeung-Levy, Ludwig Schmidt, and Emma Lundberg. Previously, I received my Ph.D. in Computer Science from Stanford University and my B.E. in Computer Science from Tsinghua University.

I am on the job market this year (2025-2026).

Research

My research focuses on the foundations of multimodal machine learning (e.g., vision and language) and its applications to science (e.g., computational cell biology). My recent works explore:

Embedding-based multimodal foundation models (ICLR'24, ICLR'23, NeurIPS'22)
Generative multimodal foundation models (NeurIPS'24, EMNLP'24, ACL'23)
Agentic/compounding systems (CVPR'25, CVPR'24, ECCV'24)
Generative models for science (ICML'25, NEJM AI'24, npj Digital Medicine'19)

News

10/2025: Received my Ph.D. degree from Stanford University and started as a postdoctoral scholar. Huge thanks to my advisors and committee members: Professors Serena Yeung-Levy, Ludwig Schmidt, Tatsunori Hashimoto, Emma Lundberg, and John Cioffi!
09/2025: Honored to be selected as one of the Rising Stars in Data Science.
08/2025: Our study about CLIP vs DINO & Code Equivalence Checking are accepted to EMNLP 2025, and Multimodal Symbolic Reasoning is accepted to NeurIPS 2025.
07/2025: Presented CellFlux at ICML 2025. Thanks Citadel for providing the travel grant!
06/2025: Honored to be selected as one of the participants in CVPR 2025 Doctoral Consortium. Also excited to host DCVLR: Data Curation for Vision Language Reasoning challenge at NeurIPS 2025.
05/2025: Our latest advance in virtual cell modeling (a.k.a. world model for cells), CellFlux (formerly CellFlow), has been accepted to ICML 2025. NegVQA has been accepted to ACL 2025 Findings.
04/2025: Three papers presented at ICLR 2025: VLM Interpretability, VidDiff, Inverse Scaling.
03/2025: Three papers accepted to CVPR 2025: AutoConverter, MicroVQA, BIOMEDICA. We are organizing DataWorld Workshop at ICML 2025.
02/2025: Introduce CellFlow, a flow-matching based method for cellular morphology prediction. We are organizing XAI4CV Workshop at CVPR 2025, MMFM-BIOMED Workshop at CVPR 2025, XLLM Workshop at ACL 2025.
01/2025: Introduce AutoConverter, an agentic framework to convert open-ended VQA questions into the multiple-choice format. VidDiff accepted to ICLR 2025, VLM Interpretability accepted to ICLR 2025 Blog Track.
12/2024: Two papers presented at NeurIPS 2024 and two at ML4H 2024.
11/2024: Our work is selected as an oral presentation (198/6105) at EMNLP 2024! Also selected as one of NeurIPS 2024 Top Reviewers and EMNLP 2024 Outstanding Reviewers.
10/2024: VLMClassifier is accepted to NeurIPS 2024; Micro-Bench is accepted to NeurIPS 2024 Datasets and Benchmarks Track.
09/2024: Our work analyzing pre-trained language models for image generation is accepted to EMNLP 2024 main conference.
07/2024: VideoAgent is accepted to ECCV 2024; AI scientific feedback is published in NEJM AI.
06/2024: Our new work investigates why visually-grounded language models are bad at the basic image classification task.
05/2024: Selected as a Citadel GQS Fellowship finalist and gave a talk in Chicago.
04/2024: VisDiff is selected as an oral presentation (90/11532) at CVPR 2024!
03/2024: Introduce VideoAgent, where we leverage a large language model as an agent for long-form video understanding.
02/2024: VisDiff accepted to CVPR 2024.
01/2024: ICLR 2024: C3 explains the geometry in multi-modal contrastive representation space and introduces a three-step method to bridge the modality gap.
12/2023: Introduce VisDiff, an algorithm that automatically describes differences between two image sets, joint work with Berkeley AI Research!
11/2023: Honored to be selected as one of NeurIPS 2023 Top Reviewers.
10/2023: Large language models generate scientific feedback, answer moral and causal questions, show inverse scaling on 11 tasks.
05/2023: Larger language models are not necessarily better on all the tasks. Check our work in ACL 2023 Findings!
01/2023: Can you diagnose and rectify a vision model using language inputs? Check our work in ICLR 2023!
11/2022: We won the 3rd prize in the first-round Inverse Scaling Prize! Also check out HELM that holistically evaluates language models.
10/2022: Honored to receive a NeurIPS 2022 Scholar Award. Thank you NeurIPS organizers!
10/2022: Two more works will be presented in ML4H and NeurIPS 2022!
09/2022: Our work studying the modality gap accepted to NeurIPS 2022!
07/2020: Stanza now supports biomedical and clinical text processing!
03/2020: Announce Stanza: A Python NLP Library for Many Human Languages! Star
05/2019: Selected as the best oral presentation at 36th Tsinghua CS Forum for Graduate Students!
04/2019: How to infer thousands of diagnoses from EHRs? Check our paper in npj (Nature) Digital Medicine!
12/2018: Awarded the SenseTime Scholarship (USD 3,000). Thanks SenseTime Inc.!
10/2018: Awarded highly selective National Scholarship!
06/2018: Received Tsinghua Research Fellowship with a funding of 7,500 USD!

Selected Publications

Awards

2025

Rising Stars in Data Science

2025

CVPR Doctoral Consortium

2025

Citadel ICML Travel Grant

2024

NeurIPS Top Reviewer

2024

EMNLP Outstanding Reviewer

2024

Citadel GQS Fellowship Finalist

2023

NeurIPS Top Reviewer

2023

Stanford Data Science Scholar Finalist

2022

NeurIPS Scholar Award

2019

Best Oral Presentation Award (Presented VetTag at 36th Tsinghua CS Graduate Forum)

2019

SenseTime Scholarship (30 in China)

2018

National Scholarship (0.2% in China)

2018

Qualcomm Scholarship (100 in China)

2018

Tsinghua Research Fellowship (50 at Tsinghua)

2016-18

Tsinghua Academic / Comprehensive Excellence Scholarship

2015

Freshman Scholarship (300 at Tsinghua)

2014

Chinese Chemistry Olympiad (CChO) 1st Prize (50 in China)

Services

Program Committee

Area Chair: NeurIPS 2025, ACL 2025, EMNLP 2025

Reviewer: NeurIPS 2022-2024, ICML 2023-2025, ICLR 2024-2025, CVPR 2025, ICCV 2025, ACL 2020-2024, EMNLP 2020-2024, NAACL 2021-2024, COLM 2024, TPAMI 2023, Scientific Reports 2023

Workshop Organizer

DCVLR: Data Curation for Vision Language Reasoning @ NeurIPS 2025, DataWorld: Unifying Data Curation Frameworks Across Domains @ ICML 2025, The 4th Explainable AI for Computer Vision (XAI4CV) Workshop @ CVPR 2025, Multimodal Foundation Models for Biomedicine: Challenges and Opportunities @ CVPR 2025, The 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM) @ ACL 2025

Teaching Assistant

CS 224N: Natural Language Processing with Deep Learning, CS 271: Artificial Intelligence in Healthcare

Outreach

Student Research Workshop Mentor @ ACL 2025, Student-Applicant Support Program @ Stanford 2021-2024, Highschool Outreach Program @ CVPR 2022

Mentoring

I'm very fortunate to have worked with the following talented undergrads for ≥6 months:

Elaine Sui (Class of 2024): Stanford CS MS → Stanford CS PhD w/ SoE Fellowship

Yuchang Su (Class of 2025): Tsinghua CS Undergrad → Harvard AI in Medicine PhD w/ Fellowship

Rui Li (Class of 2025): USTC CS Undergrad w/ Highest Honor → Stanford CS PhD

Sahithi Ankireddy (Class of 2025): Caltech CS Undergrad → Stanford CS MS w/ NSF GRFP Fellowship

Yiming Liu (Class of 2027): Tsinghua CS Undergrad

Miscellaneous

I enjoy reading books. Some of my favorites: To Live (Hua Yu), Walden (Henry David Thoreau), Principles of Economics (N. Gregory Mankiw). I enjoy hiking, jogging, and swimming. I am a fan of classical music, and I was fortunate to learn basics about how to play the guitar, piano, and pipa at Tsinghua University.