Rose E. Wang
rewang at cs dot stanford dot edu

I am a PhD student at Stanford University's Computer Science Department, advised by Dorottya (Dora) Demszky and Diyi Yang. I am also the Head TA for Stanford's first class on NLP and Education, CS293/EDUC473. I also founded and organize Stanford's interdisciplinary Education Reading Group.

My research: Language is central to educational interactions. My work wrestles with the question: How can we improve student learning & build equitable systems at scale through language? I develop NLP systems measuring effective learning interactions and conduct interventions to answer this question.

My research is supported by the NSF GRFP, Bill and Melinda Gates Foundation, and National Student Support Accelerator.

Previously, I completed my undergraduate studies at MIT, working with Prof. Josh Tenenbaum, Prof. Jonathan How, Google Brain and Google Brain Robotics. In a prior lifetime, I was a passionate multilinguist ( German Abitur; Chinese, HSK Level 6; French, DELF B2; Spanish, DELE B2; European plurilingual excellence award).

[ Github  /  Twitter  /  Google Scholar  /  Blog ]

profile photo

Recent News

Research

Representative papers are highlighted.

Step-by-Step Remediation of Students' Mathematical Mistakes
Rose E. Wang, Qingyang Zhang, Carly Robinson, Susanna Loeb, Dorottya (Dora) Demszky
NAACL 2024.
Featured in Stanford HAI
[ Paper, Code ]

We explore the potential for large language models (LLMs) to assist math tutors in remediating student mistakes. We present ReMath, a benchmark co-developed with experienced math teachers that deconstructs their thought process for remediation. Our work sheds light on the potential and limitations of using current LLMs to provide high-quality learning experiences for both tutors and students at scale.

🛠️ Edu-ConvoKit: An Open-Source Library for Education Conversation Data
Rose E. Wang, Dorottya (Dora) Demszky
NAACL 2024.
[ Code, Documentation, Paper ]

We introduce Edu-ConvoKit, an opensource library designed to handle preprocessing, annotation and analysis of conversation data in education.

Backtracing: Retrieving the Cause of the Query
Rose E. Wang, Pawan Wirawarn, Omar Khattab, Noah Goodman, Dorottya (Dora) Demszky
EACL 2024, Long Paper Findings.
Featured in Stanford HAI
[ Paper, Code, Video, Poster ]

Many online content portals allow users to ask questions to supplement their understanding (e.g., of lectures or news articles). While information retrieval (IR) systems may provide answers for such user queries, they do not directly assist content creators identify segments that caused a user to ask those questions; this can be useful for several purposes like helping improve their content. We introduce the task of backtracing, in which systems retrieve the text segment that most likely provoked a user query.

Does Feedback on Talk Time Increase Student Engagement? Evidence from a Randomized Controlled Trial on a Math Tutoring Platform
Dorottya (Dora) Demszky, Rose E. Wang, Sean Geraghty, Carol Yu
In the 14th Learning Analytics and Knowledge Conference (LAK '24).
[ Paper ]

Providing ample opportunities for students to express their thinking is pivotal to their learning of mathematical concepts. We introduce the Talk Meter, which provides in-the-moment automated feedback on student-teacher talk ratios. We conduct a randomized controlled trial on a virtual math tutoring platform (n=742 tutors) to evaluate the effectiveness of the Talk Meter at increasing student talk. In one treatment arm, we show the Talk Meter only to the tutor, while in the other arm we show it to both the student and the tutor. We find that the Talk Meter increases student talk ratios in both treatment conditions by 13-14%; this trend is driven by the tutor talking less in the tutor-facing condition, whereas in the studentfacing condition it is driven by the student expressing significantly more mathematical thinking. These results demonstrate the promise of in-the-moment joint talk time feedback to both teachers and students as a low cost, engaging, and scalable way to increase students’ mathematical reasoning.

Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction
Rose E. Wang, Dorottya (Dora) Demszky
In the Proceedings of Innovative Use of NLP for Building Educational Applications (2023).
Selected as BEA 2023's Ambassador Paper
Featured in Forbes and Stanford HAI
[ Project page, Video, Paper, Code ]

We explore whether generative AI could become a cost-effective complement to expert feedback by serving as an automated teacher coach. We propose three teacher coaching tasks for generative AI: (A) scoring transcript segments based on classroom observation instruments, (B) identifying highlights and missed opportunities for good instructional strategies, and (C) providing actionable suggestions for eliciting more student reasoning.

“Mistakes Help Us Grow”: Facilitating and Evaluating Growth Mindset Supportive Language in Classrooms
Kunal Handa, Margaret Clapper, Jessica Boyle, Rose E. Wang, Diyi Yang, David S Yeager, Dorottya (Dora) Demszky
In the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).
Featured in Stanford HAI
[ Paper ]

Teachers’ growth mindset supportive language (GMSL)—rhetoric emphasizing that one's skills can be improved over time—has been shown to significantly reduce disparities in academic achievement and enhance students' learning outcomes. Although teachers espouse growth mindset principles, most find it difficult to adopt GMSL in their practice due the lack of effective coaching in this area. We explore whether large language models (LLMs) can provide automated, personalized coaching to support teachers' use of GMSL. We conduct a large-scale evaluation involving 174 teachers and 1,006 students, finding that both teachers and students perceive GMSL-trained teacher and model reframings as more effective in fostering a growth mindset and promoting challenge-seeking behavior, among other benefits. We also find that model-generated reframings outperform those from the GMSL-trained teachers. These results show promise for harnessing LLMs to provide automated GMSL feedback for teachers and, more broadly, LLMs’ potentiality for supporting students’ learning in the classroom.

SIGHT: A Large Annotated Dataset on Student Insights Gathered from Higher Education Transcripts
Rose E. Wang*, Pawan Wirawarn*, Noah Goodman, Dorottya (Dora) Demszky
In the Proceedings of Innovative Use of NLP for Building Educational Applications (2023).
[ Project page, Video, Paper, Code ]

We build SIGHT, a large dataset of 288 math lecture transcripts and 15,784 comments collected from the Massachusetts Institute of Technology OpenCourseWare (MIT OCW) YouTube channel. We additionally develop a rubric for categorizing student feedback types, and scaling annotation for teachers to better understand the needs of their students.

Solving math word problems by combining language models with symbolic solvers
Joy He-Yueya, Gabriel Poesia, Rose E. Wang, Noah Goodman
ArXiv (2023).
[ Paper ]

We propose an approach that combines an LLM that can incrementally formalize word problems as a set of variables and equations with an external symbolic solver that can solve the equations.

hpp Evaluating Human-Language Model Interaction
Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E. Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang
In submission (2023).
[ Paper ]

We develop Human-AI Language-based Interaction Evaluation (HALIE) that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality.

kts In the ZONE: Measuring difficulty and progression in curriculum generation
Rose E. Wang, Jesse Mu, Dilip Arumugam, Natasha Jaques, Noah Goodman
NeurIPS 2022 Deep Reinforcement Learning Workshop.
[ Paper, Invited Talk at UC Berkeley's Multi-Agent Learning Seminar ]

A common strategy in curriculum generation for reinforcement learning is to train a teacher network to generate tasks that enable student learning. But, what kind of tasks enables this? One answer is tasks belonging to a student's zone of proximal development (ZPD), a concept from developmental psychology. These are tasks that are not too easy and not too hard for the student. Albeit intuitive, ZPD is not well understood computationally. We propose ZONE, a novel computational framework that operationalizes ZPD. It formalizes ZPD through the language of Bayesian probability theory, revealing that tasks should be selected by difficulty (the student's probability of task success) and learning progression (the degree of change in the student's model parameters).

elign ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
Zixian Ma, Rose E. Wang, Li Fei-Fei, Michael Bernstein, Ranjay Krishna
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
[ Paper, Code ]

Modern multi-agent reinforcement learning frameworks rely on centralized training and reward shaping to perform well. However, centralized training and dense rewards are not readily available in the real world. Current multi-agent algorithms struggle to learn in the alternative setup of decentralized training or sparse rewards. To address these issues, we propose a self-supervised intrinsic reward ELIGN - expectation alignment - inspired by the self-organization principle in Zoology.

confidence Speaking with Confidence: Investigating the effect of uncertainty in pragmatic language learning
Pawan Wirawarn, Rose E. Wang, Noah Goodman
CURIS 2022.
[ Poster ]

Our work explores whether pragmatic language learning is better with a well-calibrated domain-agnostic listener.

elign CLaP: Conditional Latent Planners for Offline Reinforcement Learning
Harry Donghyeop Shin, Rose E. Wang
NeurIPS 2022 Workshop on Foundation Models for Decision Making.
[ Paper, Code (coming soon) ]

Recent work has formulated offline reinforcement learning (RL) as a sequence modeling problem, benefiting from the simplicity and scalability of the Transformer architecture. However, sequence models struggle to model trajectories that are long-horizon or involve complicated environment dynamics. We propose CLaP (Conditional Latent Planners) to learn a simple goal-conditioned latent space from offline agent behavior, and incrementally decode good actions from a latent plan.

kts Know Thy Student: Interactive Learning with Gaussian Processes
Rose E. Wang, Mike Wu, Noah Goodman
ICLR 2022 Workshop on From Cells to Societies: Collective Learning across Scales.
[ Paper ]

Learning often involves interaction between multiple agents. Human teacher-student settings best illustrate how interactions result in efficient knowledge passing where the teacher constructs a curriculum based on their students' abilities. Prior work in machine teaching studies how the teacher should construct optimal teaching datasets assuming the teacher knows everything about the student. However, in the real world, the teacher doesn't have complete information and must probe before teaching. Our work proposes a simple probing algorithm which uses Gaussian processes for inferring student-related information, before constructing a teaching dataset.

hpp Language modeling via stochastic processes
Rose E. Wang, Esin Durmus, Noah Goodman, Tatsunori Hashimoto,
International Conference for Learning Representations (ICLR) 2022.
Oral Presentation (1.6% oral acceptance rate)
[ Paper, Video, Code ]

Modern language models can generate high-quality short texts. However, they often meander or are incoherent when generating longer texts. These issues arise from the next-token-only language modeling objective. To address these issues, we introduce Time Control (TC), a language model that implicitly plans via a latent stochastic process. TC does this by learning a representation which maps the dynamics of how text changes in a document to the dynamics of a stochastic process of interest. Using this representation, the language model can generate text by first implicitly generating a document plan via a stochastic process, and then generating text that is consistent with this latent plan.

hpp Calibrate your listeners! Robust communication-based training for pragmatic speakers
Rose E. Wang, Julia White, Jesse Mu, Noah Goodman
Findings of EMNLP 2021.
[ Paper, Video, Code ]

To be good conversational partners, natural language processing (NLP) systems should be trained to produce contextually useful utterances. Prior work has investigated training NLP systems with communication-based objectives, where a neural listener stands in as a communication partner. However, these systems commonly suffer from semantic drift where the learned language diverges radically from natural language. We propose a method that uses a population of neural listeners to regularize speaker training.

hpp On the opportunities and risks of foundation models
Many authors..., Rose E. Wang, more authors,...
August 2021.

This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations)..

hpp Too many cooks: Bayesian inference for coordinating multi-agent collaboration
Rose E. Wang*, Sarah Wu*, James A. Evans, Joshua B. Tenenbaum, David C. Parkes, Max Kleiman-Weiner
Journal of the Cognitive Science Society, April 2021.
NeurIPS 2020 Cooperative AI workshop.
Won best paper award at NeurIPS 2020 Cooperative AI Workshop!
[ Paper, Video, Code ]

We develop Bayesian Delegation, a decentralized multi-agent learning mechanism that enables agents to rapidly infer the sub-tasks of others by inverse planning. We demonstrate that our model is a capable ad-hoc collaborator, scales with team size and makes inferences about intent similar to human observers.

hpp Model-based Reinforcement Learning for Multiagent Goal Alignment
Rose E. Wang, J.Chase Kew, Dennis Lee, Tsang-Wei Edward Lee, Tingnan Zhang, Brian Ichter, Jie Tan, Aleksandra Faust
Conference on Robot Learning (CoRL) 2020.
Mentioned in Google AI Year in Review, 2020.
[ Paper, Video, Project Page, Blog post ]

In this work, we present hierarchical predictive planning (HPP) for decentralized multiagent navigation tasks. Our approach is trained in simulation and works in unseen settings both in simulation and in the real world (zero shot transfer)!

hpp Too many cooks: Coordinating multi-agent collaboration through inverse planning
Rose E. Wang*, Sarah Wu*, James A. Evans, Joshua B. Tenenbaum, David C. Parkes, Max Kleiman-Weiner
Human-Like Machine Intelligence (book published with Oxford University Press)
Annual Meeting of the Cognitive Science Society (CogSci) 2020
International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2020
Invited paper to OptLearnMAS Workshop at AAMAS 2020
Won best paper award for Computational Modeling for Higher Cognition at CogSci 2020!
[ Paper, Video, Code ]

We develop Bayesian Delegation, a decentralized multi-agent learning mechanism that enables agents to rapidly infer the sub-tasks of others by inverse planning.

rmaddpg R-MADDPG for Partially Observable Environments and Limited Communication
Rose E. Wang, Michael Everett, Jonathan P. How
International Conference on Machine Learning (ICML) 2019, Reinforcement Learning for Real Life Workshop
[ Paper, Code, Project Page ]

This paper introduces a deep recurrent multiagent actor-critic framework (R-MADDPG) for handling multiagent coordination under partial observable settings and limited communication.

rc66 DRIV3N: Race to Autonomy
Rose E. Wang, Austin Floyd, Marwa Abdulhai, Luxas Novak, David Klee, Sean Patrick Kelley
Robotics: Science and Systems I, 2017.
[ Video, Project Page ]

A whirlwind of an experience where my team and I developed a fast, autonomous, ~maze-solving~ racecars equipped with no machine learning technology and a decorative safety controller.

Template from this website.