Hi! I am a postdoctoral scholar at Stanford University, working with Professors Serena Yeung-Levy, Ludwig Schmidt, and Emma Lundberg. Previously, I received my Ph.D. in Computer Science from Stanford University and my B.E. in Computer Science from Tsinghua University.
Research
My research focuses on the foundations and applications of multimodal intelligence. We live in a multimodal world, perceiving and reasoning through vision, language, sound, and other forms of information. This is especially important in science, including biology and medicine, where knowledge spans heterogeneous data modalities.
To build general physical and scientific intelligence, we need models that can reliably represent, understand, reason over, and generate across diverse modalities. My work pursues this goal along two complementary directions: developing foundational understandings of multimodal models, including modality gaps in CLIP [NeurIPS'22, ICLR'23, ICLR'24, Preprint'25], visual perception limits in multimodal language models [NeurIPS'24, CVPR'25, EMNLP'25, Preprint'26], and robust training of flow matching [CVPR'26, CVPR'26], and translating these insights into applications such as virtual cell modeling [ICML'25, Preprint'26, Preprint'26, Preprint'26] and scientific reasoning & discovery [NEJM AI'24, CVPR'24, CVPR'25, EACL'26, Preprint'26].
News
- 04/2026: How to improve the physical correctness of virtual cell? Check out CellFluxRL.
- 03/2026: V-GRPO, Stochastic Injection for Flow Matching, and VLM Illusions are accepted to CVPR and Findings.
- 02/2026: TVP and MedEvidence are accepted to ICLR 2026.
- 01/2026: CellFluxV2 is released. PaperSearchQA is accepted to EACL 2026.
- 12/2025: We will be hosting MMFM-BIOMED, DataCV, and XAI4CV workshops at CVPR 2026.
- 10/2025: Received my Ph.D. degree from Stanford University and started as a postdoctoral scholar. Huge thanks to my advisors and committee members: Professors Serena Yeung-Levy, Ludwig Schmidt, Tatsunori Hashimoto, Emma Lundberg, and John Cioffi!
- 09/2025: Honored to be selected as one of the Rising Stars in Data Science.
- 08/2025: Our study about CLIP vs DINO & Code Equivalence Checking are accepted to EMNLP 2025, and Multimodal Symbolic Reasoning is accepted to NeurIPS 2025.
- 07/2025: Presented CellFlux at ICML 2025. Thanks Citadel for providing the travel grant!
- 06/2025: Honored to be selected as one of the participants in CVPR 2025 Doctoral Consortium. Also excited to host DCVLR: Data Curation for Vision Language Reasoning challenge at NeurIPS 2025.
- 05/2025: Our latest advance in virtual cell modeling (a.k.a. world model for cells), CellFlux (formerly CellFlow), has been accepted to ICML 2025. NegVQA has been accepted to ACL 2025 Findings.
- 04/2025: Three papers presented at ICLR 2025: VLM Interpretability, VidDiff, Inverse Scaling.
- 03/2025: Three papers accepted to CVPR 2025: AutoConverter, MicroVQA, BIOMEDICA. We are organizing DataWorld Workshop at ICML 2025.
- 02/2025: Introduce CellFlow, a flow-matching based method for cellular morphology prediction. We are organizing XAI4CV Workshop at CVPR 2025, MMFM-BIOMED Workshop at CVPR 2025, XLLM Workshop at ACL 2025.
- 01/2025: Introduce AutoConverter, an agentic framework to convert open-ended VQA questions into the multiple-choice format. VidDiff accepted to ICLR 2025, VLM Interpretability accepted to ICLR 2025 Blog Track.
- 12/2024: Two papers presented at NeurIPS 2024 and two at ML4H 2024.
- 11/2024: Our work is selected as an oral presentation (198/6105) at EMNLP 2024! Also selected as one of NeurIPS 2024 Top Reviewers and EMNLP 2024 Outstanding Reviewers.
- 10/2024: VLMClassifier is accepted to NeurIPS 2024; Micro-Bench is accepted to NeurIPS 2024 Datasets and Benchmarks Track.
- 09/2024: Our work analyzing pre-trained language models for image generation is accepted to EMNLP 2024 main conference.
- 07/2024: VideoAgent is accepted to ECCV 2024; AI scientific feedback is published in NEJM AI.
- 06/2024: Our new work investigates why visually-grounded language models are bad at the basic image classification task.
- 05/2024: Selected as a Citadel GQS Fellowship finalist and gave a talk in Chicago.
- 04/2024: VisDiff is selected as an oral presentation (90/11532) at CVPR 2024!
- 03/2024: Introduce VideoAgent, where we leverage a large language model as an agent for long-form video understanding.
- 02/2024: VisDiff accepted to CVPR 2024.
- 01/2024: ICLR 2024: C3 explains the geometry in multi-modal contrastive representation space and introduces a three-step method to bridge the modality gap.
- 12/2023: Introduce VisDiff, an algorithm that automatically describes differences between two image sets, joint work with Berkeley AI Research!
- 11/2023: Honored to be selected as one of NeurIPS 2023 Top Reviewers.
- 10/2023: Large language models generate scientific feedback, answer moral and causal questions, show inverse scaling on 11 tasks.
- 05/2023: Larger language models are not necessarily better on all the tasks. Check our work in ACL 2023 Findings!
- 01/2023: Can you diagnose and rectify a vision model using language inputs? Check our work in ICLR 2023!
- 11/2022: We won the 3rd prize in the first-round Inverse Scaling Prize! Also check out HELM that holistically evaluates language models.
- 10/2022: Honored to receive a NeurIPS 2022 Scholar Award. Thank you NeurIPS organizers!
- 10/2022: Two more works will be presented in ML4H and NeurIPS 2022!
- 09/2022: Our work studying the modality gap accepted to NeurIPS 2022!
- 07/2020: Stanza now supports biomedical and clinical text processing!
- 03/2020: Announce Stanza: A Python NLP Library for Many Human Languages! Star
- 05/2019: Selected as the best oral presentation at 36th Tsinghua CS Forum for Graduate Students!
- 04/2019: How to infer thousands of diagnoses from EHRs? Check our paper in npj (Nature) Digital Medicine!
- 12/2018: Awarded the SenseTime Scholarship (USD 3,000). Thanks SenseTime Inc.!
- 10/2018: Awarded highly selective National Scholarship!
- 06/2018: Received Tsinghua Research Fellowship with a funding of 7,500 USD!
Selected Publications
Awards
Services
Area Chair: ICML 2026, NeurIPS 2025-2026, ACL 2025-2026, EMNLP 2025-2026
Reviewer: NeurIPS 2022-2024, ICML 2023-2025, ICLR 2024-2025, CVPR 2025-2026, ICCV 2025, ECCV 2026, ACL 2020-2024, EMNLP 2020-2024, NAACL 2021-2024, COLM 2024, TPAMI 2023, Scientific Reports 2023
The 2nd Multimodal Foundation Models for Biomedicine: Challenges and Opportunities @ CVPR 2026
The 5th DataCV Workshop and Challenge @ CVPR 2026
The 5th Explainable AI for Computer Vision (XAI4CV) Workshop @ CVPR 2026
DCVLR: Data Curation for Vision Language Reasoning @ NeurIPS 2025
DataWorld: Unifying Data Curation Frameworks Across Domains @ ICML 2025
The 4th Explainable AI for Computer Vision (XAI4CV) Workshop @ CVPR 2025
Multimodal Foundation Models for Biomedicine: Challenges and Opportunities @ CVPR 2025
The 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM) @ ACL 2025
CS 224N: Natural Language Processing with Deep Learning, CS 271: Artificial Intelligence in Healthcare
Student Research Workshop Mentor @ ACL 2025, Student-Applicant Support Program @ Stanford 2021-2024, Highschool Outreach Program @ CVPR 2022
I'm very fortunate to have worked with the following talented undergrads for ≥6 months:
Elaine Sui (Class of 2024): Stanford CS MS → Stanford CS PhD w/ SoE Fellowship
Yuchang Su (Class of 2025): Tsinghua CS Undergrad → Harvard AI in Medicine PhD w/ Fellowship
Rui Li (Class of 2025): USTC CS Undergrad w/ Highest Honor → Stanford CS PhD
Sahithi Ankireddy (Class of 2025): Caltech CS Undergrad → Stanford CS MS w/ NSF GRFP Fellowship
Binxu Li (Class of 2026): Stanford EE MS → Princeton ECE PhD w/ Fellowship
Bingda Tang (Class of 2026): Tsinghua CS Undergrad → Berkeley CS PhD
Xueqiao Sun (Class of 2026): Tsinghua CS Undergrad → CMU CS PhD
He Li (Class of 2026): Tsinghua CS Undergrad → UW CS PhD
Yiming Liu (Class of 2027): Tsinghua CS Undergrad
Miscellaneous
I enjoy reading books. Some of my favorites: To Live (Hua Yu), Walden (Henry David Thoreau), Principles of Economics (N. Gregory Mankiw). I enjoy hiking, jogging, and swimming. I am a fan of classical music, and I was fortunate to learn basics about how to play the guitar, piano, and pipa at Tsinghua University.