Rose E. Wang

Rose E. Wang
rewang at cs dot stanford dot edu

I am a PhD student at Stanford University's Computer Science Department, advised by Dorottya (Dora) Demszky and Diyi Yang. I am also the Head TA for Stanford's first class on NLP and Education, CS293/EDUC473. I also founded and organize Stanford's interdisciplinary Education Reading Group.

My research: Language is central to educational interactions. My work wrestles with the question: How can we improve student learning & build equitable systems at scale through language? I develop NLP systems measuring effective learning interactions and conduct interventions to answer this question.

My research is supported by the NSF GRFP, Bill and Melinda Gates Foundation, and National Student Support Accelerator.

Previously, I completed my undergraduate studies at MIT, working with Prof. Josh Tenenbaum, Prof. Jonathan How, Google Brain and Google Brain Robotics. In a prior lifetime, I was a passionate multilinguist ( German Abitur; Chinese, HSK Level 6; French, DELF B2; Spanish, DELE B2; European plurilingual excellence award).

[ Github / Twitter / Google Scholar / Blog ]

Research

Representative papers are highlighted.

	Step-by-Step Remediation of Students' Mathematical Mistakes Rose E. Wang, Qingyang Zhang, Carly Robinson, Susanna Loeb, Dorottya (Dora) Demszky NAACL 2024. Featured in Stanford HAI [ Paper, Code ] We explore the potential for large language models (LLMs) to assist math tutors in remediating student mistakes. We present ReMath, a benchmark co-developed with experienced math teachers that deconstructs their thought process for remediation. Our work sheds light on the potential and limitations of using current LLMs to provide high-quality learning experiences for both tutors and students at scale.
	🛠️ Edu-ConvoKit: An Open-Source Library for Education Conversation Data Rose E. Wang, Dorottya (Dora) Demszky NAACL 2024. [ Code, Documentation, Paper ] We introduce Edu-ConvoKit, an opensource library designed to handle preprocessing, annotation and analysis of conversation data in education.
	Backtracing: Retrieving the Cause of the Query Rose E. Wang, Pawan Wirawarn, Omar Khattab, Noah Goodman, Dorottya (Dora) Demszky EACL 2024, Long Paper Findings. Featured in Stanford HAI [ Paper, Code, Video, Poster ] Many online content portals allow users to ask questions to supplement their understanding (e.g., of lectures or news articles). While information retrieval (IR) systems may provide answers for such user queries, they do not directly assist content creators identify segments that caused a user to ask those questions; this can be useful for several purposes like helping improve their content. We introduce the task of backtracing, in which systems retrieve the text segment that most likely provoked a user query.
	Does Feedback on Talk Time Increase Student Engagement? Evidence from a Randomized Controlled Trial on a Math Tutoring Platform Dorottya (Dora) Demszky, Rose E. Wang, Sean Geraghty, Carol Yu In the 14th Learning Analytics and Knowledge Conference (LAK '24). [ Paper ] Providing ample opportunities for students to express their thinking is pivotal to their learning of mathematical concepts. We introduce the Talk Meter, which provides in-the-moment automated feedback on student-teacher talk ratios. We conduct a randomized controlled trial on a virtual math tutoring platform (n=742 tutors) to evaluate the effectiveness of the Talk Meter at increasing student talk. In one treatment arm, we show the Talk Meter only to the tutor, while in the other arm we show it to both the student and the tutor. We find that the Talk Meter increases student talk ratios in both treatment conditions by 13-14%; this trend is driven by the tutor talking less in the tutor-facing condition, whereas in the studentfacing condition it is driven by the student expressing significantly more mathematical thinking. These results demonstrate the promise of in-the-moment joint talk time feedback to both teachers and students as a low cost, engaging, and scalable way to increase students’ mathematical reasoning.
	Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction Rose E. Wang, Dorottya (Dora) Demszky In the Proceedings of Innovative Use of NLP for Building Educational Applications (2023). Selected as BEA 2023's Ambassador Paper Featured in Forbes and Stanford HAI [ Project page, Video, Paper, Code ] We explore whether generative AI could become a cost-effective complement to expert feedback by serving as an automated teacher coach. We propose three teacher coaching tasks for generative AI: (A) scoring transcript segments based on classroom observation instruments, (B) identifying highlights and missed opportunities for good instructional strategies, and (C) providing actionable suggestions for eliciting more student reasoning.
	“Mistakes Help Us Grow”: Facilitating and Evaluating Growth Mindset Supportive Language in Classrooms Kunal Handa, Margaret Clapper, Jessica Boyle, Rose E. Wang, Diyi Yang, David S Yeager, Dorottya (Dora) Demszky In the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Featured in Stanford HAI [ Paper ] Teachers’ growth mindset supportive language (GMSL)—rhetoric emphasizing that one's skills can be improved over time—has been shown to significantly reduce disparities in academic achievement and enhance students' learning outcomes. Although teachers espouse growth mindset principles, most find it difficult to adopt GMSL in their practice due the lack of effective coaching in this area. We explore whether large language models (LLMs) can provide automated, personalized coaching to support teachers' use of GMSL. We conduct a large-scale evaluation involving 174 teachers and 1,006 students, finding that both teachers and students perceive GMSL-trained teacher and model reframings as more effective in fostering a growth mindset and promoting challenge-seeking behavior, among other benefits. We also find that model-generated reframings outperform those from the GMSL-trained teachers. These results show promise for harnessing LLMs to provide automated GMSL feedback for teachers and, more broadly, LLMs’ potentiality for supporting students’ learning in the classroom.
	SIGHT: A Large Annotated Dataset on Student Insights Gathered from Higher Education Transcripts Rose E. Wang, Pawan Wirawarn, Noah Goodman, Dorottya (Dora) Demszky In the Proceedings of Innovative Use of NLP for Building Educational Applications (2023). [ Project page, Video, Paper, Code ] We build SIGHT, a large dataset of 288 math lecture transcripts and 15,784 comments collected from the Massachusetts Institute of Technology OpenCourseWare (MIT OCW) YouTube channel. We additionally develop a rubric for categorizing student feedback types, and scaling annotation for teachers to better understand the needs of their students.
	Solving math word problems by combining language models with symbolic solvers Joy He-Yueya, Gabriel Poesia, Rose E. Wang, Noah Goodman ArXiv (2023). [ Paper ] We propose an approach that combines an LLM that can incrementally formalize word problems as a set of variables and equations with an external symbolic solver that can solve the equations.
	Evaluating Human-Language Model Interaction Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E. Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang In submission (2023). [ Paper ] We develop Human-AI Language-based Interaction Evaluation (HALIE) that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality.
	In the ZONE: Measuring difficulty and progression in curriculum generation Rose E. Wang, Jesse Mu, Dilip Arumugam, Natasha Jaques, Noah Goodman NeurIPS 2022 Deep Reinforcement Learning Workshop. [ Paper, Invited Talk at UC Berkeley's Multi-Agent Learning Seminar ] A common strategy in curriculum generation for reinforcement learning is to train a teacher network to generate tasks that enable student learning. But, what kind of tasks enables this? One answer is tasks belonging to a student's zone of proximal development (ZPD), a concept from developmental psychology. These are tasks that are not too easy and not too hard for the student. Albeit intuitive, ZPD is not well understood computationally. We propose ZONE, a novel computational framework that operationalizes ZPD. It formalizes ZPD through the language of Bayesian probability theory, revealing that tasks should be selected by difficulty (the student's probability of task success) and learning progression (the degree of change in the student's model parameters).
	ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward Zixian Ma, Rose E. Wang, Li Fei-Fei, Michael Bernstein, Ranjay Krishna 36th Conference on Neural Information Processing Systems (NeurIPS 2022). [ Paper, Code ] Modern multi-agent reinforcement learning frameworks rely on centralized training and reward shaping to perform well. However, centralized training and dense rewards are not readily available in the real world. Current multi-agent algorithms struggle to learn in the alternative setup of decentralized training or sparse rewards. To address these issues, we propose a self-supervised intrinsic reward ELIGN - expectation alignment - inspired by the self-organization principle in Zoology.
	Speaking with Confidence: Investigating the effect of uncertainty in pragmatic language learning Pawan Wirawarn, Rose E. Wang, Noah Goodman CURIS 2022. [ Poster ] Our work explores whether pragmatic language learning is better with a well-calibrated domain-agnostic listener.
	CLaP: Conditional Latent Planners for Offline Reinforcement Learning Harry Donghyeop Shin, Rose E. Wang NeurIPS 2022 Workshop on Foundation Models for Decision Making. [ Paper, Code (coming soon) ] Recent work has formulated offline reinforcement learning (RL) as a sequence modeling problem, benefiting from the simplicity and scalability of the Transformer architecture. However, sequence models struggle to model trajectories that are long-horizon or involve complicated environment dynamics. We propose CLaP (Conditional Latent Planners) to learn a simple goal-conditioned latent space from offline agent behavior, and incrementally decode good actions from a latent plan.
	Know Thy Student: Interactive Learning with Gaussian Processes Rose E. Wang, Mike Wu, Noah Goodman ICLR 2022 Workshop on From Cells to Societies: Collective Learning across Scales. [ Paper ] Learning often involves interaction between multiple agents. Human teacher-student settings best illustrate how interactions result in efficient knowledge passing where the teacher constructs a curriculum based on their students' abilities. Prior work in machine teaching studies how the teacher should construct optimal teaching datasets assuming the teacher knows everything about the student. However, in the real world, the teacher doesn't have complete information and must probe before teaching. Our work proposes a simple probing algorithm which uses Gaussian processes for inferring student-related information, before constructing a teaching dataset.
	Language modeling via stochastic processes Rose E. Wang, Esin Durmus, Noah Goodman, Tatsunori Hashimoto, International Conference for Learning Representations (ICLR) 2022. Oral Presentation (1.6% oral acceptance rate) [ Paper, Video, Code ] Modern language models can generate high-quality short texts. However, they often meander or are incoherent when generating longer texts. These issues arise from the next-token-only language modeling objective. To address these issues, we introduce Time Control (TC), a language model that implicitly plans via a latent stochastic process. TC does this by learning a representation which maps the dynamics of how text changes in a document to the dynamics of a stochastic process of interest. Using this representation, the language model can generate text by first implicitly generating a document plan via a stochastic process, and then generating text that is consistent with this latent plan.
	Calibrate your listeners! Robust communication-based training for pragmatic speakers Rose E. Wang, Julia White, Jesse Mu, Noah Goodman Findings of EMNLP 2021. [ Paper, Video, Code ] To be good conversational partners, natural language processing (NLP) systems should be trained to produce contextually useful utterances. Prior work has investigated training NLP systems with communication-based objectives, where a neural listener stands in as a communication partner. However, these systems commonly suffer from semantic drift where the learned language diverges radically from natural language. We propose a method that uses a population of neural listeners to regularize speaker training.
	On the opportunities and risks of foundation models Many authors..., Rose E. Wang, more authors,... August 2021. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations)..
	Too many cooks: Bayesian inference for coordinating multi-agent collaboration Rose E. Wang, Sarah Wu, James A. Evans, Joshua B. Tenenbaum, David C. Parkes, Max Kleiman-Weiner Journal of the Cognitive Science Society, April 2021. NeurIPS 2020 Cooperative AI workshop. Won best paper award at NeurIPS 2020 Cooperative AI Workshop! [ Paper, Video, Code ] We develop Bayesian Delegation, a decentralized multi-agent learning mechanism that enables agents to rapidly infer the sub-tasks of others by inverse planning. We demonstrate that our model is a capable ad-hoc collaborator, scales with team size and makes inferences about intent similar to human observers.
	Model-based Reinforcement Learning for Multiagent Goal Alignment Rose E. Wang, J.Chase Kew, Dennis Lee, Tsang-Wei Edward Lee, Tingnan Zhang, Brian Ichter, Jie Tan, Aleksandra Faust Conference on Robot Learning (CoRL) 2020. Mentioned in Google AI Year in Review, 2020. [ Paper, Video, Project Page, Blog post ] In this work, we present hierarchical predictive planning (HPP) for decentralized multiagent navigation tasks. Our approach is trained in simulation and works in unseen settings both in simulation and in the real world (zero shot transfer)!
	Too many cooks: Coordinating multi-agent collaboration through inverse planning Rose E. Wang, Sarah Wu, James A. Evans, Joshua B. Tenenbaum, David C. Parkes, Max Kleiman-Weiner Human-Like Machine Intelligence (book published with Oxford University Press) Annual Meeting of the Cognitive Science Society (CogSci) 2020 International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2020 Invited paper to OptLearnMAS Workshop at AAMAS 2020 Won best paper award for Computational Modeling for Higher Cognition at CogSci 2020! [ Paper, Video, Code ] We develop Bayesian Delegation, a decentralized multi-agent learning mechanism that enables agents to rapidly infer the sub-tasks of others by inverse planning.
	R-MADDPG for Partially Observable Environments and Limited Communication Rose E. Wang, Michael Everett, Jonathan P. How International Conference on Machine Learning (ICML) 2019, Reinforcement Learning for Real Life Workshop [ Paper, Code, Project Page ] This paper introduces a deep recurrent multiagent actor-critic framework (R-MADDPG) for handling multiagent coordination under partial observable settings and limited communication.
	DRIV3N: Race to Autonomy Rose E. Wang, Austin Floyd, Marwa Abdulhai, Luxas Novak, David Klee, Sean Patrick Kelley Robotics: Science and Systems I, 2017. [ Video, Project Page ] A whirlwind of an experience where my team and I developed a fast, autonomous, ~maze-solving~ racecars equipped with no machine learning technology and a decorative safety controller.

Template from this website.

Recent News

Research