Yijia Shao

yijia.jpg

I am a 2nd-year PhD student at Stanford NLP advised by Diyi Yang. I had the pleasure to work with Monica S. Lam and Michael Bernstein during the rotation program. Previously, I was an undergraduate student at Yuanpei College in Peking University where I got into ML and NLP research by working with Bing Liu. In summer 2022 , I had a research internship in UCLA hosted by Nanyun Peng. Before that, I have worked as a research intern in Microsoft Research Asia (blog spotlight in Chinese) and an engineering intern in Tensorflow Lite team at Google, Beijing.

My research interests lie in ML and NLP. Nowadays, I’m interested in positioning NLP models (e.g., LLM) into larger systems. Here are some core problems I’m thinking about:

  • How can AI models bridge human and systems or systems and systems?
  • How can AI-empowered systems collaborate with users effectively?
  • How to continually improve these systems through the interaction with human and external systems?

Many kind people helped me a lot in my journey. If you want to talk more about research or seek advice that I might be able to provide, feel free to book a chat here.

News

  • (Jan, 2025) Collaborative Gym, a framework for enabling and evaluating human-agent collaboration, is out on arXiv. Check out our preprint and Twitter thread! Working on making collaborative agents accessible and codebase release 💪, stay tuned!
  • (Jan, 2025) Will give a talk on LM Agent Privacy Risk (paper) at JP Morgan AI Research seminar.
  • (Sep, 2024) PrivacyLens is accepted to NeurIPS 2024 Datasets and Benchmarks Track. Super excited to attend NeurIPS for the first time!
  • (Sep, 2024) Gave a guest lecture on "Knowledge Curation" at CS 224V with Yucheng and Prof. Lam. You can find the slides here.
  • (Sep, 2024) Collaborative STORM, a major update to the STORM project that brings human in to the loop, is now accepted to EMNLP 2024. Our knowledge-storm package is now v1.0.0! [release note]

Selected Projects

LM-Empowered System for Knowledge Curation
LM-Empowered System for Knowledge Curation

We study the development of knowledge agent for writing long, organized, and well-grounded articles, and how humans can collaborate with knowledge agents.

Highlights:

Continual Learning in NLP
Continual Learning in NLP

We study (1) continual pre-training/post-training of language models (LMs) and (2) enabling LMs to continually learn new tasks after deployment.

Highlights:

  • Continual Pre-training of Language Models, In ICLR 2023
    We propose a post-training algorithm with adaptive soft-masking mechanism that selectively updates LM parameters based on the post-training corpus to minimize catastrophic forgetting and enhance knowledge transfer.

  • Class-Incremental Learning based on Label Generation, In ACL 2023
    We investigate continual learning with classification objective and generation objective by examining representation collapse in pretrained models throughout the learning process.

  • ContinualLM  GitHub Repo stars

Other Related Works:

Domain Adaptive Pre-training (EMNLP’22), Few-shot Continual Learning (EMNLP’22), Investigating Continual Learning in Computer Vision (ICLR’24)

Recent Preprints & Publications

(*: Equal Contribution)

Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Yijia Shao, Vinay Samuel*, Yucheng Jiang*, John Yang, Diyi Yang
Preprint (arXiv:2412.15701).
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action
Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, Diyi Yang
In NeurIPS 2024 Dataset and Benchmarks Track.
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
Yucheng Jiang*, Yijia Shao*, Dekun Ma, Sina J. Semnani, Monica S. Lam
In EMNLP 2024.
Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman
In COLM 2024.
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Omar Shaikh*, Michelle Lam*, Joey Hejna*, Yijia Shao, Michael Bernstein, Diyi Yang
Preprint (arXiv:2406.00888).
Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, Monica S. Lam
In NAACL 2024.
Haowei Lin, Yijia Shao, Weinan Qian, Ningxin Pan, Yiduo Guo, Bing Liu
In ICLR 2024.

Selected Awards

  • School of Engineering Fellowship, Stanford, 2023
  • SenseTime Scholarship, 2022 (awarded to 30 students in China)
  • May 4th Scholarship, 2021 (the highest honor for students in PKU)
  • National Scholarship, 2020, 2022
  • First prize in 12th Chinese Mathematics Competition Final, 2020
  • Merit Student Pacesetter, 2020, 2021, 2022
  • First Class Scholarship for Freshmen of Peking University, 2019

Misc.

In my free time, I like cooking, travelling, and competitive ballroom dancing!