Publications

2025

Establishing best practices for building rigorous agentic benchmarks. Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun, Andy Zhang, Shu Liu, Sasha Cui, Sayash Kapoor, Shayne Longpre, Kevin Meng, Rebecca Weiss, Fazl Barez, Rahul Gupta, Jwala Dhamala, Jacob Merizian, Mario Giulianelli, Harry Coppock, Cozmin Ududec, Jasjeet Sekhon, Jacob Steinhardt, Antony Kellerman, Sarah Schwettmann, Matei Zaharia, Ion Stoica, Percy Liang, Daniel Kang. arXiv, 2025. [bib] [paper]

RoboArena: distributed real-world evaluation of generalist robot policies. Pranav Atreya, Karl Pertsch, Tony Lee, Moo Jin Kim, Arhan Jain, Artur Kuramshin, Clemens Eppner, Cyrus Neary, Edward Hu, Fabio Ramos, Jonathan Tremblay, Kanav Arora, Kirsty Ellis, Luca Macesanu, Matthew Leonard, Meedeum Cho, Ozgur Aslan, Shivin Dass, Jie Wang, Xingfang Yuan, Xuning Yang, Abhishek Gupta, Dinesh Jayaraman, Glen Berseth, Kostas Daniilidis, Roberto Martin-Martin, Youngwoon Lee, Percy Liang, Chelsea Finn, Sergey Levine. arXiv, 2025. [bib] [paper]

Evaluating robot policies in a world model. Julian Quevedo, Percy Liang, Sherry Yang. arXiv, 2025. [bib] [paper]

The california report on frontier AI policy. Rishi Bommasani, Scott R. Singer, Ruth E. Appel, Sarah Cen, A. Feder Cooper, Elena Cryst, Lindsey A. Gailmard, Ian Klaus, Meredith M. Lee, Inioluwa Deborah Raji, Anka Reuel, Drew Spence, Alexander Wan, Angelina Wang, Daniel Zhang, Daniel E. Ho, Percy Liang, Dawn Song, Joseph E. Gonzalez, Jonathan Zittrain, Jennifer Tour Chayes, Mariano-Florentino Cuellar, Li Fei-Fei. arXiv, 2025. [bib] [paper]

BountyBench: dollar impact of AI agent attackers and defenders on real-world cybersecurity systems. Andy K. Zhang, Joey Ji, Celeste Menders, Riya Dulepet, Thomas Qin, Ron Y. Wang, Junrong Wu, Kyleen Liao, Jiliang Li, Jinghan Hu, Sara Hong, Nardos Demilew, Shivatmica Murgai, Jason Tran, Nishka Kacheria, Ethan Ho, Denis Liu, Lauren McLane, Olivia Bruvik, Dai-Rong Han, Seungwoo Kim, Akhil Vyas, Cuiyuanxiu Chen, Ryan Li, Weiran Xu, Jonathan Z. Ye, Prerit Choudhary, Siddharth M. Bhatia, Vikram Sivashankar, Yuxuan Bao, Dawn Song, Dan Boneh, Daniel E. Ho, Percy Liang. arXiv, 2025. [bib] [paper]

Extracting memorized pieces of (copyrighted) books from open-weight language models. A. Feder Cooper, Aaron Gokaslan, Amy B. Cyphert, Christopher De Sa, Mark A. Lemley, Daniel E. Ho, Percy Liang. arXiv, 2025. [bib] [paper]

MLE-Dojo: interactive environments for empowering LLM agents in machine learning engineering. Rushi Qiang, Yuchen Zhuang, Yinghao Li, Dingu Sagar V K, Rongzhi Zhang, Changhao Li, Ian Shu-Hei Wong, Sherry Yang, Percy Liang, Chao Zhang, Bo Dai. arXiv, 2025. [bib] [paper]

MedHELM: holistic evaluation of large language models for medical tasks. Suhana Bedi, Hejie Cui, Miguel Fuentes, Alyssa Unell, Michael Wornow, Juan M. Banda, Nikesh Kotecha, Timothy Keyes, Yifan Mai, Mert Oez, Hao Qiu, Shrey Jain, Leonardo Schettini, Mehr Kashyap, Jason Alan Fries, Akshay Swaminathan, Philip Chung, Fateme Nateghi, Asad Aali, Ashwin Nayak, Shivam Vedak, Sneha S. Jain, Birju Patel, Oluseyi Fayanju, Shreya Shah, Ethan Goh, Dong-han Yao, Brian Soetikno, Eduardo Reis, Sergios Gatidis, Vasu Divi, Robson Capasso, Rachna Saralkar, Chia-Chun Chiang, Jenelle Jindal, Tho Pham, Faraz Ghoddusi, Steven Lin, Albert S. Chiou, Christy Hong, Mohana Roy, Michael F. Gensheimer, Hinesh Patel, Kevin Schulman, Dev Dash, Danton Char, Lance Downing, Francois Grolleau, Kameron Black, Bethel Mieso, Aydin Zahedivash, Wen-wai Yim, Harshita Sharma, Tony Lee, Hannah Kirsch, Jennifer Lee, Nerissa Ambers, Carlene Lugtu, Aditya Sharma, Bilal Mawji, Alex Alekseyev, Vicky Zhou, Vikas Kakkar, Jarrod Helzer, Anurang Revri, Yair Bannett, Roxana Daneshjou, Jonathan Chen, Emily Alsentzer, Keith Morse, Nirmal Ravi, Nima Aghaeepour, Vanessa Kennedy, Akshay Chaudhari, Thomas Wang, Sanmi Koyejo, Matthew P. Lungren, Eric Horvitz, Percy Liang, Mike Pfeffer, Nigam H. Shah. arXiv, 2025. [bib] [paper]

The mighty ToRR: a benchmark for table reasoning and robustness. Shir Ashury-Tahan, Yifan Mai, Rajmohan C, Ariel Gera, Yotam Perlitz, Asaf Yehudai, Elron Bandel, Leshem Choshen, Eyal Shnarch, Percy Liang, Michal Shmueli-Scheuer. arXiv, 2025. [bib] [paper]

Fine-Tuning vision-language-action models: optimizing speed and success. Moo Jin Kim, Chelsea Finn, Percy Liang. arXiv, 2025. [bib] [paper]

Language models may verbatim complete text they were not explicitly trained on. Ken Ziyu Liu, Christopher A. Choquette-Choo, Matthew Jagielski, Peter Kairouz, Sanmi Koyejo, Percy Liang, Nicolas Papernot. International Conference on Machine Learning (ICML), 2025. [bib] [paper]

Independence tests for language models. Sally Zhu, Ahmed Ahmed, Rohith Kuditipudi, Percy Liang. International Conference on Machine Learning (ICML), 2025. [bib] [paper]

Eliciting language model behaviors with investigator agents. Xiang Lisa Li, Neil Chowdhury, Daniel D. Johnson, Tatsunori Hashimoto, Percy Liang, Sarah Schwettmann, Jacob Steinhardt. International Conference on Machine Learning (ICML), 2025. [bib] [paper]

Auditing prompt caching in language model APIs. Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, Tatsunori Hashimoto. International Conference on Machine Learning (ICML), 2025. [bib] [paper]

Reliable and efficient amortized model-based evaluation. Sang Truong, Yuheng Tu, Percy Liang, Bo Li, Sanmi Koyejo. International Conference on Machine Learning (ICML), 2025. [bib] [paper]

Language model developers should report train-test overlap. Andy K Zhang, Kevin Klyman, Yifan Mai, Yoav Levine, Yian Zhang, Rishi Bommasani, Percy Liang. International Conference on Machine Learning (Position Paper Track) (ICML), 2025. [bib] [paper]

In-House evaluation is not enough: towards robust third-party flaw disclosure for general-purpose AI. Shayne Longpre, Kevin Klyman, Ruth E. Appel, Sayash Kapoor, Rishi Bommasani, Michelle Sahar, Sean McGregor, Avijit Ghosh, Borhane Blili-Hamelin, Nathan Butters, Alondra Nelson, Amit Elazari, Andrew Sellars, Casey John Ellis, Dane Sherrets, Dawn Song, Harley Geiger, Ilona Cohen, Lauren McIlvenny, Madhulika Srikumar, Mark M. Jaycox, Markus Anderljung, Nadine Farid Johnson, Nicholas Carlini, Nicolas Miailhe, Nik Marda, Peter Henderson, Rebecca S. Portnoff, Rebecca Weiss, Victoria Westerhoff, Yacine Jernite, Rumman Chowdhury, Percy Liang, Arvind Narayanan. International Conference on Machine Learning (Position Paper Track) (ICML), 2025. [bib] [paper]

s1: simple test-time scaling. Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto. arXiv, 2025. [bib] [paper]

VideoAgent: self-improving video generation. Achint Soni, Sreyas Venkataraman, Abhranil Chandra, Sebastian Fischmeister, Percy Liang, Bo Dai, Sherry Yang. arXiv, 2025. [bib] [paper]

Instruction following without instruction tuning. John Hewitt, Nelson F. Liu, Percy Liang, Christopher D. Manning. arXiv, 2025. [bib] [paper]

Language models prefer what they know: relative confidence estimation via confidence preferences. Vaishnavi Shrivastava, Ananya Kumar, Percy Liang. arXiv, 2025. [bib] [paper]

AILuminate: introducing v1.0 of the AI risk and reliability benchmark from mlcommons. Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, Sean McGregor, Kenneth Fricklas, Mala Kumar, Quentin Feuillade--Montixi, Kurt Bollacker, Felix Friedrich, Ryan Tsang, Bertie Vidgen, Alicia Parrish, Chris Knotz, Eleonora Presani, Jonathan Bennion, Marisa Ferrara Boston, Mike Kuniavsky, Wiebke Hutiri, James Ezick, Malek Ben Salem, Rajat Sahay, Sujata Goswami, Usman Gohar, Ben Huang, Supheakmungkol Sarin, Elie Alhajjar, Canyu Chen, Roman Eng, Kashyap Ramanandula Manjusha, Virendra Mehta, Eileen Long, Murali Emani, Natan Vidra, Benjamin Rukundo, Abolfazl Shahbazi, Kongtao Chen, Rajat Ghosh, Vithursan Thangarasa, Pierre Peigné, Abhinav Singh, Max Bartolo, Satyapriya Krishna, Mubashara Akhtar, Rafael Gold, Cody Coleman, Luis Oala, Vassil Tashev, Joseph Marvin Imperial, Amy Russ, Sasidhar Kunapuli, Nicolas Miailhe, Julien Delaunay, Bhaktipriya Radharapu, Rajat Shinde, Tuesday, Debojyoti Dutta, Declan Grabb, Ananya Gangavarapu, Saurav Sahay, Agasthya Gangavarapu, Patrick Schramowski, Stephen Singam, Tom David, Xudong Han, Priyanka Mary Mammen, Tarunima Prabhakar, Venelin Kovatchev, Rebecca Weiss, Ahmed Ahmed, Kelvin N. Manyeki, Sandeep Madireddy, Foutse Khomh, Fedor Zhdanov, Joachim Baumann, Nina Vasan, Xianjun Yang, Carlos Mougn, Jibin Rajan Varghese, Hussain Chinoy, Seshakrishna Jitendar, Manil Maskey, Claire V. Hardgrove, Tianhao Li, Aakash Gupta, Emil Joswin, Yifan Mai, Shachi H Kumar, Cigdem Patlak, Kevin Lu, Vincent Alessi, Sree Bhargavi Balija, Chenhe Gu, Robert Sullivan, James Gealy, Matt Lavrisa, James Goel, Peter Mattson, Percy Liang, Joaquin Vanschoren. arXiv, 2025.

@article{ghosh2025ailuminate,
  author = {Shaona Ghosh and Heather Frase and Adina Williams and Sarah Luger and Paul Röttger and Fazl Barez and Sean McGregor and Kenneth Fricklas and Mala Kumar and Quentin Feuillade--Montixi and Kurt Bollacker and Felix Friedrich and Ryan Tsang and Bertie Vidgen and Alicia Parrish and Chris Knotz and Eleonora Presani and Jonathan Bennion and Marisa Ferrara Boston and Mike Kuniavsky and Wiebke Hutiri and James Ezick and Malek Ben Salem and Rajat Sahay and Sujata Goswami and Usman Gohar and Ben Huang and Supheakmungkol Sarin and Elie Alhajjar and Canyu Chen and Roman Eng and Kashyap Ramanandula Manjusha and Virendra Mehta and Eileen Long and Murali Emani and Natan Vidra and Benjamin Rukundo and Abolfazl Shahbazi and Kongtao Chen and Rajat Ghosh and Vithursan Thangarasa and Pierre Peigné and Abhinav Singh and Max Bartolo and Satyapriya Krishna and Mubashara Akhtar and Rafael Gold and Cody Coleman and Luis Oala and Vassil Tashev and Joseph Marvin Imperial and Amy Russ and Sasidhar Kunapuli and Nicolas Miailhe and Julien Delaunay and Bhaktipriya Radharapu and Rajat Shinde and  Tuesday and Debojyoti Dutta and Declan Grabb and Ananya Gangavarapu and Saurav Sahay and Agasthya Gangavarapu and Patrick Schramowski and Stephen Singam and Tom David and Xudong Han and Priyanka Mary Mammen and Tarunima Prabhakar and Venelin Kovatchev and Rebecca Weiss and Ahmed Ahmed and Kelvin N. Manyeki and Sandeep Madireddy and Foutse Khomh and Fedor Zhdanov and Joachim Baumann and Nina Vasan and Xianjun Yang and Carlos Mougn and Jibin Rajan Varghese and Hussain Chinoy and Seshakrishna Jitendar and Manil Maskey and Claire V. Hardgrove and Tianhao Li and Aakash Gupta and Emil Joswin and Yifan Mai and Shachi H Kumar and Cigdem Patlak and Kevin Lu and Vincent Alessi and Sree Bhargavi Balija and Chenhe Gu and Robert Sullivan and James Gealy and Matt Lavrisa and James Goel and Peter Mattson and Percy Liang and Joaquin Vanschoren},
  journal = {arXiv},
  title = {AILuminate: Introducing v1.0 of the {AI} Risk and Reliability Benchmark from MLCommons},
  year = {2025},
}

[bib] [paper]

LawInstruct: a resource for studying language model adaptation to the legal domain. Joel Niklaus, Lucia Zheng, Arya D. McCarthy, Christopher Hahn, Brian M. Rosen, Peter Henderson, Daniel E. Ho, Garrett Honke, Percy Liang, Christopher Manning. Findings of North America Association for Computational Linguistics (Findings of NAACL), 2025. [bib] [paper]

AIR-Bench 2024: a safety benchmark based on risk categories from regulations and policies. Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li. International Conference on Learning Representations (ICLR), 2025. [bib] [paper]

Cybench: a framework for evaluating cybersecurity capabilities and risk of language models. Andy K. Zhang, Neil Perry, Riya Dulepet, Eliot Jones, Justin W. Lin, Joey Ji, Celeste Menders, Gashon Hussein, Samantha Liu, Donovan Jasper, Pura Peetathawatchai, Ari Glenn, Vikram Sivashankar, Daniel Zamoshchin, Leo Glikbarg, Derek Askaryar, Mike Yang, Teddy Zhang, Rishi Alluri, Nathan Tran, Rinnara Sangpisit, Polycarpos Yiorkadjis, Kenny Osele, Gautham Raghupathi, Dan Boneh, Daniel E. Ho, Percy Liang. International Conference on Learning Representations (ICLR), 2025. [bib] [paper]

Model equality testing: which model is this API serving? Irena Gao, Percy Liang, Carlos Guestrin. International Conference on Learning Representations (ICLR), 2025. [bib] [paper]

AutoBencher: creating salient, novel, difficult datasets for language models. Xiang Lisa Li, Evan Zheran Liu, Percy Liang, Tatsunori Hashimoto. International Conference on Learning Representations (ICLR), 2025. [bib] [paper]

Understanding warmup-stable-decay learning rates: a river valley loss landscape perspective. Kaiyue Wen, Zhiyuan Li, Jason Wang, David Hall, Percy Liang, Tengyu Ma. International Conference on Learning Representations (ICLR), 2025. [bib] [paper]

BioDiscoveryAgent: an AI agent for designing genetic perturbation experiments. Yusuf Roohani, Andrew Lee, Qian Huang, Jian Vora, Zachary Steinhart, Kexin Huang, Alexander Marson, Percy Liang, Jure Leskovec. International Conference on Learning Representations (ICLR), 2025. [bib] [paper]

2024

Generative agent simulations of 1,000 people. Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, Michael S. Bernstein. arXiv, 2024. [bib] [paper]

Machine unlearning doesn't do what you think: lessons for generative AI policy, research, and practice. A. Feder Cooper, Christopher A. Choquette-Choo, Miranda Bogen, Matthew Jagielski, Katja Filippova, Ken Ziyu Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Niloofar Mireshghallah, Ilia Shumailov, Eleni Triantafillou, Peter Kairouz, Nicole Mitchell, Percy Liang, Daniel E. Ho, Yejin Choi, Sanmi Koyejo, Fernando Delgado, James Grimmelmann, Vitaly Shmatikov, Christopher De Sa, Solon Barocas, Amy Cyphert, Mark Lemley, danah boyd, Jennifer Wortman Vaughan, Miles Brundage, David Bau, Seth Neel, Abigail Z. Jacobs, Andreas Terzis, Hanna Wallach, Nicolas Papernot, Katherine Lee. arXiv, 2024. [bib] [paper]

Model editing with canonical examples. John Hewitt, Sarah Chen, Lanruo Lora Xie, Edward Adams, Percy Liang, Christopher D. Manning. arXiv, 2024. [bib] [paper]

AI risk categorization decoded (AIR 2024): from government regulations to corporate policies. Yi Zeng, Kevin Klyman, Andy Zhou, Yu Yang, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li. arXiv, 2024. [bib] [paper]

OpenVLA: an open-source vision-language-action model. Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn. Conference on Robot Learning (CoRL), 2024. [bib] [paper]

Length-Controlled alpacaeval: a simple way to debias automatic evaluators. Yann Dubois, Balázs Galambosi, Percy Liang, Tatsunori B. Hashimoto. Conference on Language Modeling (COLM), 2024. [bib] [paper]

Embodied agent interface: benchmarking llms for embodied decision making. Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu. Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks), 2024. [bib] [paper]

VHELM: a holistic evaluation of vision language models. Tony Lee, Haoqin Tu, Chi Heem Wong, Wenhao Zheng, Yiyang Zhou, Yifan Mai, Josselin Somerville Roberts, Michihiro Yasunaga, Huaxiu Yao, Cihang Xie, Percy Liang. Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks), 2024. [bib] [paper]

Image2Struct: benchmarking structure extraction for vision-language models. Josselin Somerville Roberts, Tony Lee, Chi Heem Wong, Michihiro Yasunaga, Yifan Mai, Percy Liang. Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks), 2024. [bib] [paper]

The responsible foundation model development cheatsheet: a review of tools & resources. Shayne Longpre, Stella Biderman, Alon Albalak, Hailey Schoelkopf, Daniel McDuff, Sayash Kapoor, Kevin Klyman, Kyle Lo, Gabriel Ilharco, Nay San, Maribeth Rauh, Aviya Skowron, Bertie Vidgen, Laura Weidinger, Arvind Narayanan, Victor Sanh, David Adelani, Percy Liang, Rishi Bommasani, Peter Henderson, Sasha Luccioni, Yacine Jernite, Luca Soldaini. Transcations of Machine Learning Research (TMLR), 2024. [bib] [paper]

The 2024 foundation model transparency index. Rishi Bommasani, Kevin Klyman, Sayash Kapoor, Shayne Longpre, Betty Xiong, Nestor Maslej, Percy Liang. Transcations of Machine Learning Research (TMLR), 2024. [bib] [paper]

Anticipatory music transformer. John Thickstun, D. Hall, Chris Donahue, Percy Liang. Transcations of Machine Learning Research (TMLR), 2024. [bib] [paper]

Robust distortion-free watermarks for language models. Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang. Transcations of Machine Learning Research (TMLR), 2024. [bib] [paper]

MLAgentBench: evaluating language agents on machine learning experimentation. Qian Huang, Jian Vora, Percy Liang, Jure Leskovec. International Conference on Machine Learning (ICML), 2024. [bib] [paper]

Prismatic VLMs: investigating the design space of visually-conditioned language models. Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna, Percy Liang, Thomas Kollar, Dorsa Sadigh. International Conference on Machine Learning (ICML), 2024. [bib] [paper]

On the societal impact of open foundation models. Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins, Kevin Bankston, Stella Biderman, Miranda Bogen, Rumman Chowdhury, Alex Engler, Peter Henderson, Yacine Jernite, Seth Lazar, Stefano Maffulli, Alondra Nelson, Joelle Pineau, Aviya Skowron, Dawn Song, Victor Storchan, Daniel Zhang, Daniel E. Ho, Percy Liang, Arvind Narayanan. International Conference on Machine Learning (Position Paper Track) (ICML), 2024. [bib] [paper]

A safe harbor for AI evaluation and red teaming. Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson. International Conference on Machine Learning (Position Paper Track) (ICML), 2024. [bib] [paper]

Sophia: a scalable stochastic second-order optimizer for language model pre-training. Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma. International Conference on Learning Representations (ICLR), 2024. [bib] [paper]

Benchmarking and improving generator-validator consistency of language models. Xiang Lisa Li, Vaishnavi Shrivastava, Siyan Li, Tatsunori Hashimoto, Percy Liang. International Conference on Learning Representations (ICLR), 2024. [bib] [paper]

Large language models as analogical reasoners. Michihiro Yasunaga, Xinyun Chen, Yujia Li, Panupong Pasupat, J. Leskovec, Percy Liang, Ed H. Chi, Denny Zhou. International Conference on Learning Representations (ICLR), 2024. [bib] [paper]

On the learnability of watermarks for language models. Chenchen Gu, Xiang Lisa Li, Percy Liang, Tatsunori Hashimoto. International Conference on Learning Representations (ICLR), 2024. [bib] [paper]

MedAlign: a clinician-generated dataset for instruction following with electronic medical records. S. Fleming, A. Lozano, W. Haberkorn, Jenelle A. Jindal, E. Reis, Rahul Thapa, L. Blankemeier, Julian Z. Genkins, E. Steinberg, A. Nayak, Birju S. Patel, Chia-Chun Chiang, A. Callahan, Zepeng Huo, S. Gatidis, S. Adams, Oluseyi Fayanju, Shreya J. Shah, Thomas Savage, Ethan Goh, A. Chaudhari, N. Aghaeepour, Christopher D. Sharp, M. Pfeffer, Percy Liang, Jonathan H. Chen, K. Morse, E. Brunskill, Jason Alan Fries, N. Shah. Association for the Advancement of Artificial Intelligence (AAAI), 2024. [bib] [paper]

2023

The foundation model transparency index. Rishi Bommasani, Kevin Klyman, Shayne Longpre, Sayash Kapoor, Nestor Maslej, Betty Xiong, Daniel Zhang, Percy Liang. Transcations of Machine Learning Research (TMLR), 2023. [bib] [paper]

Llamas know what gpts don't show: surrogate models for confidence estimation. Vaishnavi Shrivastava, Percy Liang, Ananya Kumar. arXiv, 2023. [bib] [paper]

Data selection for language models via importance resampling. Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2023. [bib] [paper]

DoReMi: optimizing data mixtures speeds up language model pretraining. Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, A. Yu. Advances in Neural Information Processing Systems (NeurIPS), 2023. [bib] [paper]

PRODIGY: enabling in-context learning over graphs. Qian Huang, Hongyu Ren, Peng Chen, Gregor Krvzmanc, D. Zeng, Percy Liang, J. Leskovec. Advances in Neural Information Processing Systems (NeurIPS), 2023. [bib] [paper]

Lexinvariant language models. Qian Huang, E. Zelikman, Sarah Chen, Yuhuai Wu, G. Valiant, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2023. [bib] [paper]

Ecosystem-level analysis of deployed machine learning reveals homogeneous outcomes. Connor Toups, Rishi Bommasani, Kathleen A. Creel, Sarah H. Bana, Dan Jurafsky, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2023. [bib] [paper]

AlpacaFarm: a simulation framework for methods that learn from human feedback. Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto. Advances in Neural Information Processing Systems (NeurIPS), 2023. [bib] [paper]

Cheaply evaluating inference efficiency metrics for autoregressive transformer apis. Deepak Narayanan, Keshav Santhanam, Peter Henderson, Rishi Bommasani, Tony Lee, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2023. [bib] [paper]

Holistic evaluation of text-to-image models. Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Teufel, Marco Bellagente, Minguk Kang, Taesung Park, J. Leskovec, Jun-Yan Zhu, Fei-Fei Li, Jiajun Wu, Stefano Ermon, Percy Liang. Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks), 2023. [bib] [paper]

Lost in the middle: how language models use long contexts. Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang. Transactions of the Association for Computational Linguistics (TACL), 2023. [bib] [paper]

Evaluating verifiability in generative search engines. Nelson F. Liu, Tianyi Zhang, Percy Liang. Findings of Empirical Methods in Natural Language Processing (Findings of EMNLP), 2023. [bib] [paper]

Evaluating human-language model interaction. Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E. Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang. Transcations of Machine Learning Research (TMLR), 2023. [bib] [paper]

Trustworthy social bias measurement. Rishi Bommasani, Percy Liang. arXiv, 2023. [bib] [paper]

Benchmarking large language models for news summarization. Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, K. McKeown, Tatsunori Hashimoto. arXiv, 2023. [bib] [paper]

Demonstrate-Search-Predict: composing retrieval and language models for knowledge-intensive NLP. Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, Matei Zaharia. arXiv, 2023. [bib] [paper]

Generative agents: interactive simulacra of human behavior. Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein. User Interface Software and Technology (UIST), 2023. Best paper award. [bib] [paper]

Foundation models and fair use. Peter Henderson, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley, Percy Liang. Journal of Machine Learning Research (JMLR), 2023. [bib] [paper]

Ecosystem graphs: the social footprint of foundation models. Rishi Bommasani, Dilara Soylu, Thomas Liao, Kathleen A. Creel, Percy Liang. arXiv, 2023. [bib] [paper]

Language-driven representation learning for robotics. Siddharth Karamcheti, Suraj Nair, Annie S. Chen, T. Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang. Robotics: Science and Systems (RSS), 2023. Best paper award finalist. [bib] [paper]

High-throughput generative inference of large language models with a single GPU. Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark W. Barrett, Joseph Gonzalez, Percy Liang, Christopher Ré, I. Stoica, Ce Zhang. International Conference on Machine Learning (ICML), 2023. [bib] [paper]

Retrieval-Augmented multimodal language modeling. Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih. International Conference on Machine Learning (ICML), 2023. [bib] [paper]

Whose opinions do language models reflect? Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto. International Conference on Machine Learning (ICML), 2023. [bib] [paper]

Evaluating self-supervised learning via risk decomposition. Yann Dubois, Tatsunori Hashimoto, Percy Liang. International Conference on Machine Learning (ICML), 2023. [bib] [paper]

Out-of-Domain robustness via targeted augmentations. Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang. International Conference on Machine Learning (ICML), 2023. [bib] [paper]

Are sample-efficient NLP models more robust? Nelson F. Liu, Ananya Kumar, Percy Liang, Robin Jia. Association for Computational Linguistics (ACL), 2023. [bib] [paper]

Do question answering modeling improvements hold across benchmarks? Nelson F. Liu, Tony Lee, Robin Jia, Percy Liang. Association for Computational Linguistics (ACL), 2023. [bib] [paper]

Beyond positive scaling: how negation impacts scaling trends of language models. Yuhui Zhang*, Michihiro Yasunaga*, Zhengping Zhou*, Jeff Z. HaoChen*, James Zou, Percy Liang, Serena Yeung. Findings of the Association for Computational Linguistics (Findings of ACL), 2023. [bib] [paper]

Contrastive decoding: open-ended text generation as optimization. Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, M. Lewis. Association for Computational Linguistics (ACL), 2023. [bib] [paper]

Backpack language models. John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang. Association for Computational Linguistics (ACL), 2023. Outstanding paper award. [bib] [paper]

"No, to the right"-- online language corrections for robotic manipulation via shared autonomy. Yuchen Cui, Siddharth Karamcheti, Raj Palleti, Nidhya Shivakumar, Percy Liang, Dorsa Sadigh. ACM/IEEE International Conference on Human Robot Interaction (HRI), 2023. [bib] [paper]

Is a caption worth a thousand images? a controlled study for representation learning. Shibani Santurkar, Yann Dubois, Rohan Taori, Percy Liang, Tatsunori Hashimoto. International Conference on Learning Representations (ICLR), 2023. [bib] [paper]

Surgical fine-tuning improves adaptation to distribution shifts. Yoonho Lee, Annie S. Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, Chelsea Finn. International Conference on Learning Representations (ICLR), 2023. [bib] [paper]

Holistic evaluation of language models. Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, D. Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, E. Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel J. Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan S. Kim, Neel Guha, Niladri S. Chatterji, O. Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, S. Ganguli, Tatsunori Hashimoto, Thomas F. Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda. Transcations of Machine Learning Research (TMLR), 2023. [bib] [paper]

2022

Truncation sampling as language model desmoothing. John Hewitt, Christopher D. Manning, Percy Liang. Findings of Empirical Methods in Natural Language Processing (Findings of EMNLP), 2022. [bib] [paper]

Insights into pre-training via simpler synthetic tasks. Yuhuai Wu, Felix Li, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2022. [bib] [paper]

Improving self-supervised learning by characterizing idealized representations. Yann Dubois, Tatsunori Hashimoto, S. Ermon, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2022. [bib] [paper]

Decentralized training of foundation models in heterogeneous environments. Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang. Advances in Neural Information Processing Systems (NeurIPS), 2022. [bib] [paper]

Diffusion-LM improves controllable text generation. Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori Hashimoto. Advances in Neural Information Processing Systems (NeurIPS), 2022. [bib] [paper]

What can transformers learn in-context? a case study of simple function classes. Shivam Garg, Dimitris Tsipras, Percy Liang, G. Valiant. Advances in Neural Information Processing Systems (NeurIPS), 2022. [bib] [paper]

Deep bidirectional language-knowledge graph pretraining. Michihiro Yasunaga, Antoine Bosselut, Hongyu Ren, Xikun Zhang, Christopher D. Manning, Percy Liang*, Jure Leskovec*. Advances in Neural Information Processing Systems (NeurIPS), 2022. [bib] [paper]

Picking on the same person: does algorithmic monoculture lead to outcome homogenization? Rishi Bommasani, Kathleen A. Creel, Ananya Kumar, Dan Jurafsky, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2022. [bib] [paper]

Melody transcription via generative pre-training. Chris Donahue, John Thickstun, Percy Liang. International Society for Music Information Retrieval (ISMIR), 2022. [bib] [paper]

Social simulacra: creating populated prototypes for social computing systems. J. Park, Lindsay Popowski, Carrie J. Cai, M. Morris, Percy Liang, Michael S. Bernstein. User Interface Software and Technology (UIST), 2022. [bib] [paper]

Emergent abilities of large language models. Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus. Transcations of Machine Learning Research (TMLR), 2022. [bib] [paper]

Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift. Ananya Kumar, Tengyu Ma, Percy Liang, Aditi Raghunathan. Uncertainty in Artificial Intelligence (UAI), 2022. [bib] [paper]

Shared autonomy for robotic manipulation with language corrections. Siddharth Karamcheti*, Raj Palleti*, Yuchen Cui, Percy Liang, Dorsa Sadigh. ACL Workshop for Learning with Natural Language Supervision (NL Supervision), 2022. [bib] [paper]

Connect, not collapse: explaining contrastive learning for unsupervised domain adaptation. Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Jeff Z. HaoChen, Tengyu Ma, Percy Liang. International Conference on Machine Learning (ICML), 2022. [bib] [paper]

LinkBERT: pretraining language models with document links. Michihiro Yasunaga, Jure Leskovec*, Percy Liang*. Association for Computational Linguistics (ACL), 2022. [bib] [paper] [CodaLab]

Extending the WILDS benchmark for unsupervised adaptation. Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, H. Marklund, Sara Beery, E. David, I. Stavness, Wei Guo, J. Leskovec, Kate Saenko, Tatsunori B. Hashimoto, S. Levine, Chelsea Finn, Percy Liang. International Conference on Learning Representations (ICLR), 2022. [bib] [paper]

Fine-tuning can distort pretrained features and underperform out-of-distribution. Ananya Kumar, Aditi Raghunathan, Robbie Jones, Tengyu Ma, Percy Liang. International Conference on Learning Representations (ICLR), 2022. [bib] [paper] [CodaLab]

An explanation of in-context learning as implicit Bayesian inference. Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma. International Conference on Learning Representations (ICLR), 2022. [bib] [paper] [CodaLab]

Large language models can be strong differentially private learners. Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori B. Hashimoto. International Conference on Learning Representations (ICLR), 2022. [bib] [paper]

GreaseLM: graph reasoning enhanced language models for question answering. Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D. Manning, J. Leskovec. International Conference on Learning Representations (ICLR), 2022. [bib] [paper]

CoAuthor: designing a human-AI collaborative writing dataset for exploring language model capabilities. Mina Lee, Percy Liang, Qian Yang. Conference on Human Factors in Computing Systems (CHI), 2022. Honorable mention award. [bib] [paper]

Beyond the imitation game: quantifying and extrapolating the capabilities of language models. Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, Zirui Wang, Ziyi Wu. arXiv, 2022.

@article{srivastava2022beyond,
  author = {Aarohi Srivastava and Abhinav Rastogi and Abhishek Rao and Abu Awal Md Shoeb and Abubakar Abid and Adam Fisch and Adam R. Brown and Adam Santoro and Aditya Gupta and Adrià Garriga-Alonso and Agnieszka Kluska and Aitor Lewkowycz and Akshat Agarwal and Alethea Power and Alex Ray and Alex Warstadt and Alexander W. Kocurek and Ali Safaya and Ali Tazarv and Alice Xiang and Alicia Parrish and Allen Nie and Aman Hussain and Amanda Askell and Amanda Dsouza and Ambrose Slone and Ameet Rahane and Anantharaman S. Iyer and Anders Andreassen and Andrea Madotto and Andrea Santilli and Andreas Stuhlmüller and Andrew Dai and Andrew La and Andrew Lampinen and Andy Zou and Angela Jiang and Angelica Chen and Anh Vuong and Animesh Gupta and Anna Gottardi and Antonio Norelli and Anu Venkatesh and Arash Gholamidavoodi and Arfa Tabassum and Arul Menezes and Arun Kirubarajan and Asher Mullokandov and Ashish Sabharwal and Austin Herrick and Avia Efrat and Aykut Erdem and Ayla Karakaş and B. Ryan Roberts and Bao Sheng Loe and Barret Zoph and Bartłomiej Bojanowski and Batuhan Özyurt and Behnam Hedayatnia and Behnam Neyshabur and Benjamin Inden and Benno Stein and Berk Ekmekci and Bill Yuchen Lin and Blake Howald and Bryan Orinion and Cameron Diao and Cameron Dour and Catherine Stinson and Cedrick Argueta and César Ferri Ramírez and Chandan Singh and Charles Rathkopf and Chenlin Meng and Chitta Baral and Chiyu Wu and Chris Callison-Burch and Chris Waites and Christian Voigt and Christopher D. Manning and Christopher Potts and Cindy Ramirez and Clara E. Rivera and Clemencia Siro and Colin Raffel and Courtney Ashcraft and Cristina Garbacea and Damien Sileo and Dan Garrette and Dan Hendrycks and Dan Kilman and Dan Roth and Daniel Freeman and Daniel Khashabi and Daniel Levy and Daniel Moseguí González and Danielle Perszyk and Danny Hernandez and Danqi Chen and Daphne Ippolito and Dar Gilboa and David Dohan and David Drakard and David Jurgens and Debajyoti Datta and Deep Ganguli and Denis Emelin and Denis Kleyko and Deniz Yuret and Derek Chen and Derek Tam and Dieuwke Hupkes and Diganta Misra and Dilyar Buzan and Dimitri Coelho Mollo and Diyi Yang and Dong-Ho Lee and Dylan Schrader and Ekaterina Shutova and Ekin Dogus Cubuk and Elad Segal and Eleanor Hagerman and Elizabeth Barnes and Elizabeth Donoway and Ellie Pavlick and Emanuele Rodola and Emma Lam and Eric Chu and Eric Tang and Erkut Erdem and Ernie Chang and Ethan A. Chi and Ethan Dyer and Ethan Jerzak and Ethan Kim and Eunice Engefu Manyasi and Evgenii Zheltonozhskii and Fanyue Xia and Fatemeh Siar and Fernando Martínez-Plumed and Francesca Happé and Francois Chollet and Frieda Rong and Gaurav Mishra and Genta Indra Winata and Gerard de Melo and Germán Kruszewski and Giambattista Parascandolo and Giorgio Mariani and Gloria Wang and Gonzalo Jaimovitch-López and Gregor Betz and Guy Gur-Ari and Hana Galijasevic and Hannah Kim and Hannah Rashkin and Hannaneh Hajishirzi and Harsh Mehta and Hayden Bogar and Henry Shevlin and Hinrich Schütze and Hiromu Yakura and Hongming Zhang and Hugh Mee Wong and Ian Ng and Isaac Noble and Jaap Jumelet and Jack Geissinger and Jackson Kernion and Jacob Hilton and Jaehoon Lee and Jaime Fernández Fisac and James B. Simon and James Koppel and James Zheng and James Zou and Jan Kocoń and Jana Thompson and Janelle Wingfield and Jared Kaplan and Jarema Radom and Jascha Sohl-Dickstein and Jason Phang and Jason Wei and Jason Yosinski and Jekaterina Novikova and Jelle Bosscher and Jennifer Marsh and Jeremy Kim and Jeroen Taal and Jesse Engel and Jesujoba Alabi and Jiacheng Xu and Jiaming Song and Jillian Tang and Joan Waweru and John Burden and John Miller and John U. Balis and Jonathan Batchelder and Jonathan Berant and Jörg Frohberg and Jos Rozen and Jose Hernandez-Orallo and Joseph Boudeman and Joseph Guerr and Joseph Jones and Joshua B. Tenenbaum and Joshua S. Rule and Joyce Chua and Kamil Kanclerz and Karen Livescu and Karl Krauth and Karthik Gopalakrishnan and Katerina Ignatyeva and Katja Markert and Kaustubh D. Dhole and Kevin Gimpel and Kevin Omondi and Kory Mathewson and Kristen Chiafullo and Ksenia Shkaruta and Kumar Shridhar and Kyle McDonell and Kyle Richardson and Laria Reynolds and Leo Gao and Li Zhang and Liam Dugan and Lianhui Qin and Lidia Contreras-Ochando and Louis-Philippe Morency and Luca Moschella and Lucas Lam and Lucy Noble and Ludwig Schmidt and Luheng He and Luis Oliveros Colón and Luke Metz and Lütfi Kerem Şenel and Maarten Bosma and Maarten Sap and Maartje ter Hoeve and Maheen Farooqi and Manaal Faruqui and Mantas Mazeika and Marco Baturan and Marco Marelli and Marco Maru and Maria Jose Ramírez Quintana and Marie Tolkiehn and Mario Giulianelli and Martha Lewis and Martin Potthast and Matthew L. Leavitt and Matthias Hagen and Mátyás Schubert and Medina Orduna Baitemirova and Melody Arnaud and Melvin McElrath and Michael A. Yee and Michael Cohen and Michael Gu and Michael Ivanitskiy and Michael Starritt and Michael Strube and Michał Swędrowski and Michele Bevilacqua and Michihiro Yasunaga and Mihir Kale and Mike Cain and Mimee Xu and Mirac Suzgun and Mitch Walker and Mo Tiwari and Mohit Bansal and Moin Aminnaseri and Mor Geva and Mozhdeh Gheini and Mukund Varma T and Nanyun Peng and Nathan A. Chi and Nayeon Lee and Neta Gur-Ari Krakover and Nicholas Cameron and Nicholas Roberts and Nick Doiron and Nicole Martinez and Nikita Nangia and Niklas Deckers and Niklas Muennighoff and Nitish Shirish Keskar and Niveditha S. Iyer and Noah Constant and Noah Fiedel and Nuan Wen and Oliver Zhang and Omar Agha and Omar Elbaghdadi and Omer Levy and Owain Evans and Pablo Antonio Moreno Casares and Parth Doshi and Pascale Fung and Paul Pu Liang and Paul Vicol and Pegah Alipoormolabashi and Peiyuan Liao and Percy Liang and Peter Chang and Peter Eckersley and Phu Mon Htut and Pinyu Hwang and Piotr Miłkowski and Piyush Patil and Pouya Pezeshkpour and Priti Oli and Qiaozhu Mei and Qing Lyu and Qinlang Chen and Rabin Banjade and Rachel Etta Rudolph and Raefer Gabriel and Rahel Habacker and Ramon Risco and Raphaël Millière and Rhythm Garg and Richard Barnes and Rif A. Saurous and Riku Arakawa and Robbe Raymaekers and Robert Frank and Rohan Sikand and Roman Novak and Roman Sitelew and Ronan LeBras and Rosanne Liu and Rowan Jacobs and Rui Zhang and Ruslan Salakhutdinov and Ryan Chi and Ryan Lee and Ryan Stovall and Ryan Teehan and Rylan Yang and Sahib Singh and Saif M. Mohammad and Sajant Anand and Sam Dillavou and Sam Shleifer and Sam Wiseman and Samuel Gruetter and Samuel R. Bowman and Samuel S. Schoenholz and Sanghyun Han and Sanjeev Kwatra and Sarah A. Rous and Sarik Ghazarian and Sayan Ghosh and Sean Casey and Sebastian Bischoff and Sebastian Gehrmann and Sebastian Schuster and Sepideh Sadeghi and Shadi Hamdan and Sharon Zhou and Shashank Srivastava and Sherry Shi and Shikhar Singh and Shima Asaadi and Shixiang Shane Gu and Shubh Pachchigar and Shubham Toshniwal and Shyam Upadhyay and  Shyamolima and  Debnath and Siamak Shakeri and Simon Thormeyer and Simone Melzi and Siva Reddy and Sneha Priscilla Makini and Soo-Hwan Lee and Spencer Torene and Sriharsha Hatwar and Stanislas Dehaene and Stefan Divic and Stefano Ermon and Stella Biderman and Stephanie Lin and Stephen Prasad and Steven T. Piantadosi and Stuart M. Shieber and Summer Misherghi and Svetlana Kiritchenko and Swaroop Mishra and Tal Linzen and Tal Schuster and Tao Li and Tao Yu and Tariq Ali and Tatsu Hashimoto and Te-Lin Wu and Théo Desbordes and Theodore Rothschild and Thomas Phan and Tianle Wang and Tiberius Nkinyili and Timo Schick and Timofei Kornev and Titus Tunduny and Tobias Gerstenberg and Trenton Chang and Trishala Neeraj and Tushar Khot and Tyler Shultz and Uri Shaham and Vedant Misra and Vera Demberg and Victoria Nyamai and Vikas Raunak and Vinay Ramasesh and Vinay Uday Prabhu and Vishakh Padmakumar and Vivek Srikumar and William Fedus and William Saunders and William Zhang and Wout Vossen and Xiang Ren and Xiaoyu Tong and Xinran Zhao and Xinyi Wu and Xudong Shen and Yadollah Yaghoobzadeh and Yair Lakretz and Yangqiu Song and Yasaman Bahri and Yejin Choi and Yichi Yang and Yiding Hao and Yifu Chen and Yonatan Belinkov and Yu Hou and Yufang Hou and Yuntao Bai and Zachary Seid and Zhuoye Zhao and Zijian Wang and Zijie J. Wang and Zirui Wang and Ziyi Wu},
  journal = {arXiv},
  title = {Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models},
  year = {2022},
}

[bib] [paper]

2021

How does contrastive pre-training connect disparate domains? Kendrick Shen, Robbie Matthew Jones, Ananya Kumar, Sang Michael Xie, Percy Liang. NeurIPS Workshop on Distribution Shifts, 2021. [bib]

LILA: language-informed latent actions. Siddharth Karamcheti*, Megha Srivastava*, Percy Liang, Dorsa Sadigh. Conference on Robot Learning (CoRL), 2021. [bib] [paper]

On the opportunities and risks of foundation models. Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dorottya Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang. arXiv preprint arXiv:2108.07258, 2021.

@article{bommasani2021opportunities,
  author = {Rishi Bommasani and Drew A. Hudson and Ehsan Adeli and Russ Altman and Simran Arora and Sydney von Arx and Michael S. Bernstein and Jeannette Bohg and Antoine Bosselut and Emma Brunskill and Erik Brynjolfsson and Shyamal Buch and Dallas Card and Rodrigo Castellon and Niladri Chatterji and Annie Chen and Kathleen Creel and Jared Quincy Davis and Dorottya Demszky and Chris Donahue and Moussa Doumbouya and Esin Durmus and Stefano Ermon and John Etchemendy and Kawin Ethayarajh and Li Fei-Fei and Chelsea Finn and Trevor Gale and Lauren Gillespie and Karan Goel and Noah Goodman and Shelby Grossman and Neel Guha and Tatsunori Hashimoto and Peter Henderson and John Hewitt and Daniel E. Ho and Jenny Hong and Kyle Hsu and Jing Huang and Thomas Icard and Saahil Jain and Dan Jurafsky and Pratyusha Kalluri and Siddharth Karamcheti and Geoff Keeling and Fereshte Khani and Omar Khattab and Pang Wei Koh and Mark Krass and Ranjay Krishna and Rohith Kuditipudi and Ananya Kumar and Faisal Ladhak and Mina Lee and Tony Lee and Jure Leskovec and Isabelle Levent and Xiang Lisa Li and Xuechen Li and Tengyu Ma and Ali Malik and Christopher D. Manning and Suvir Mirchandani and Eric Mitchell and Zanele Munyikwa and Suraj Nair and Avanika Narayan and Deepak Narayanan and Ben Newman and Allen Nie and Juan Carlos Niebles and Hamed Nilforoshan and Julian Nyarko and Giray Ogut and Laurel Orr and Isabel Papadimitriou and Joon Sung Park and Chris Piech and Eva Portelance and Christopher Potts and Aditi Raghunathan and Rob Reich and Hongyu Ren and Frieda Rong and Yusuf Roohani and Camilo Ruiz and Jack Ryan and Christopher Ré and Dorsa Sadigh and Shiori Sagawa and Keshav Santhanam and Andy Shih and Krishnan Srinivasan and Alex Tamkin and Rohan Taori and Armin W. Thomas and Florian Tramèr and Rose E. Wang and William Wang and Bohan Wu and Jiajun Wu and Yuhuai Wu and Sang Michael Xie and Michihiro Yasunaga and Jiaxuan You and Matei Zaharia and Michael Zhang and Tianyi Zhang and Xikun Zhang and Yuhui Zhang and Lucia Zheng and Kaitlyn Zhou and Percy Liang},
  journal = {arXiv preprint arXiv:2108.07258},
  title = {On the Opportunities and Risks of Foundation Models},
  year = {2021},
}

[bib] [paper]

LM-Critic: language models for unsupervised grammatical error correction. Michihiro Yasunaga, Jure Leskovec, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2021. [bib] [paper] [CodaLab]

Conditional probing: measuring usable information beyond a baseline. John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning. Empirical Methods in Natural Language Processing (EMNLP), 2021. [bib] [paper] [CodaLab]

Codified audio language modeling learns useful representations for music information retrieval. Rodrigo Castellon, Chris Donahue, Percy Liang. International Society for Music Information Retrieval (ISMIR), 2021. Best paper runner up. [bib] [paper] [CodaLab]

Break-It-Fix-It: unsupervised learning for program repair. Michihiro Yasunaga, Percy Liang. International Conference on Machine Learning (ICML), 2021. [bib] [paper] [CodaLab]

Catformer: designing stable transformers via sensitivity analysis. Jared Quincy Davis, Albert Gu, Krzysztof Choromanski, Tri Dao, Christopher Re, Chelsea Finn, Percy Liang. International Conference on Machine Learning (ICML), 2021. [bib] [paper]

Just train twice: improving group robustness without training group information. Evan Zheran Liu, Behzad Haghgoo, Annie S. Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, Chelsea Finn. International Conference on Machine Learning (ICML), 2021. [bib] [paper]

Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices. Evan Zheran Liu, Aditi Raghunathan, Percy Liang, Chelsea Finn. International Conference on Machine Learning (ICML), 2021. [bib] [paper]

Composed fine-tuning: freezing pre-trained denoising autoencoders for improved generalization. Sang Michael Xie, Tengyu Ma, Percy Liang. International Conference on Machine Learning (ICML), 2021. [bib] [paper] [CodaLab]

WILDS: a benchmark of in-the-wild distribution shifts. Pang Wei Koh*, Shiori Sagawa*, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang. International Conference on Machine Learning (ICML), 2021. [bib] [paper]

Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. John Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, Ludwig Schmidt. International Conference on Machine Learning (ICML), 2021. [bib] [paper]

Swords: a benchmark for lexical substitution with improved data coverage and quality. Mina Lee, C. Donahue, Robin Jia, Alexander Iyabor, Percy Liang. North American Association for Computational Linguistics (NAACL), 2021. [bib] [paper]

QA-GNN: reasoning with language models and knowledge graphs for question answering. Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, Jure Leskovec. North American Association for Computational Linguistics (NAACL), 2021. [bib] [paper] [CodaLab]

Prefix-Tuning: optimizing continuous prompts for generation. Xiang Lisa Li, Percy Liang. Association for Computational Linguistics (ACL), 2021. [bib] [paper]

Selective classification can magnify disparities across groups. Erik Jones*, Shiori Sagawa*, Pang Wei Koh*, Ananya Kumar, Percy Liang. International Conference on Learning Representations (ICLR), 2021. [bib] [paper] [CodaLab]

In-N-Out: pre-training and self-training using auxiliary information for out-of-distribution robustness. Sang Michael Xie*, Ananya Kumar*, Robbie Jones*, Fereshte Khani, Tengyu Ma, Percy Liang. International Conference on Learning Representations (ICLR), 2021. [bib] [paper] [CodaLab]

Removing spurious features can hurt accuracy and affect groups disproportionately. Fereshte Khani, Percy Liang. ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2021. [bib] [paper]

Beyond i.i.d.: three levels of generalization for question answering on knowledge bases. Yu Gu, Sue Kase, Michelle T. Vanni, Brian M. Sadler, Percy Liang, Xifeng Yan, Yu Su. World Wide Web (WWW), 2021. [bib] [paper]

Stronger data poisoning attacks break data sanitization defenses. Pang Wei Koh*, Jacob Steinhardt*, Percy Liang. Machine Learning, 2021. [bib] [paper]

2020

Learning adaptive language interfaces through decomposition. Siddharth Karamcheti, Dorsa Sadigh, Percy Liang. EMNLP Workshop for Interactive and Executable Semantic Parsing (IntEx-SemPar), 2020. [bib] [paper]

Explore then execute: adapting without rewards via factorized meta-reinforcement learning. Evan Zheran Liu, Aditi Raghunathan, Percy Liang, Chelsea Finn. arXiv preprint arXiv:2008.02790, 2020. [bib] [paper]

Learning abstract models for strategic exploration and fast reward transfer. Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang. arXiv preprint arXiv:2007.05896, 2020. [bib] [paper]

On the importance of adaptive data collection for extremely imbalanced pairwise tasks. Stephen Mussmann*, Robin Jia*, Percy Liang. Findings of Empirical Methods in Natural Language Processing (Findings of EMNLP), 2020. [bib] [paper] [CodaLab]

RNNs can generate bounded hierarchical languages with optimal memory. John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning. Empirical Methods in Natural Language Processing (EMNLP), 2020. [bib] [paper] [CodaLab]

The EOS decision and length extrapolation. Benjamin Newman, John Hewitt, Percy Liang, Christopher D. Manning. Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2020. Outstanding paper award. [bib] [paper]

Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming. Sumanth Dathathri*, Krishnamurthy Dvijotham*, Alexey Kurakin*, Aditi Raghunathan*, Jonathan Uesato*, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli. Advances in Neural Information Processing Systems (NeurIPS), 2020. [bib] [paper] [CodaLab]

Task-Oriented dialogue as dataflow synthesis. Semantic Machines, Jacob Andreas, John Bufe, David Burkett, Charles Chen, Josh Clausman, Jean Crawford, Kate Crim, Jordan DeLoach, Leah Dorner, Jason Eisner, Hao Fang, Alan Guo, David Hall, Kristin Hayes, Kellie Hill, Diana Ho, Wendy Iwaszuk, Smriti Jha, Dan Klein, Jayant Krishnamurthy, Theo Lanman, Percy Liang, Christopher H. Lin, Ilya Lintsbakh, Andy McGovern, Aleksandr Nisnevich, Adam Pauls, Dmitrij Petters, Brent Read, Dan Roth, Subhro Roy, Jesse Rusak, Beth Short, Div Slomin, Ben Snyder, Stephon Striplin, Yu Su, Zachary Tellman, Sam Thomson, Andrei Vorobev, Izabela Witoszko, Jason Wolfe, Abby Wray, Yuchen Zhang, Alexander Zotov. Transactions of the Association for Computational Linguistics (TACL), 2020. [bib] [paper]

An investigation of why overparameterization exacerbates spurious correlations. Shiori Sagawa*, Aditi Raghunathan*, Pang Wei Koh*, Percy Liang. International Conference on Machine Learning (ICML), 2020. [bib] [paper]

Concept bottleneck models. Pang Wei Koh*, Thao Nguyen*, Yew Siang Tang*, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang. International Conference on Machine Learning (ICML), 2020. [bib] [paper] [CodaLab]

Feature noise induces loss discrepancy across groups. Fereshte Khani, Percy Liang. International Conference on Machine Learning (ICML), 2020. [bib] [paper] [CodaLab]

Graph-based, self-supervised program repair from diagnostic feedback. Michihiro Yasunaga, Percy Liang. International Conference on Machine Learning (ICML), 2020. [bib] [paper] [CodaLab]

Understanding and mitigating the tradeoff between robustness and accuracy. Aditi Raghunathan*, Sang Michael Xie*, Fanny Yang, John C. Duchi, Percy Liang. International Conference on Machine Learning (ICML), 2020. [bib] [paper]

Understanding self-training for gradual domain adaptation. Ananya Kumar, Tengyu Ma, Percy Liang. International Conference on Machine Learning (ICML), 2020. [bib] [paper]

Robustness to spurious correlations via human annotations. Megha Srivastava, Tatsunori Hashimoto, Percy Liang. International Conference on Machine Learning (ICML), 2020. [bib] [paper] [CodaLab]

Robust encodings: a framework for combating adversarial typos. Erik Jones, Robin Jia*, Aditi Raghunathan*, Percy Liang. Association for Computational Linguistics (ACL), 2020. [bib] [paper] [CodaLab]

Selective question answering under domain shift. Amita Kamath, Robin Jia, Percy Liang. Association for Computational Linguistics (ACL), 2020. [bib] [paper] [CodaLab]

Shaping visual representations with language for few-shot classification. Jesse Mu, Percy Liang, Noah Goodman. Association for Computational Linguistics (ACL), 2020. Short paper. [bib] [paper] [CodaLab]

ExpBERT: representation engineering with natural language explanations. Shikhar Murty, Pang Wei Koh, Percy Liang. Association for Computational Linguistics (ACL), 2020. Short paper. [bib] [paper] [CodaLab]

Enabling language models to fill in the blanks. Chris Donahue, Mina Lee, Percy Liang. Association for Computational Linguistics (ACL), 2020. Short paper. [bib] [paper]

Distributionally robust neural networks for group shifts: on the importance of regularization for worst-case generalization. Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, Percy Liang. International Conference on Learning Representations (ICLR), 2020. [bib] [paper]

Strategies for pre-training graph neural networks. Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec. International Conference on Learning Representations (ICLR), 2020. [bib] [paper]

Selection via proxy: efficient data selection for deep learning. Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia. International Conference on Learning Representations (ICLR), 2020. [bib] [paper]

A tight analysis of greedy yields subexponential time approximation for uniform decision tree. Ray Li, Percy Liang, Stephen Mussmann. Symposium on Discrete Algorithms (SODA), 2020. [bib] [paper]

2019

Certified robustness to adversarial word substitutions. Robin Jia, Aditi Raghunathan, Kerem Göksel, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2019. [bib] [paper] [CodaLab]

Distributionally robust language modeling. Yonatan Oren*, Shiori Sagawa*, Tatsunori Hashimoto*, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2019. [bib] [paper] [CodaLab]

Designing and interpreting probes with control tasks. John Hewitt, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2019. Best paper runner up. [bib] [paper] [CodaLab]

SPoC: search-based pseudocode to code. Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2019. [bib] [paper]

Unlabeled data improves adversarial robustness. Yair Carmon*, Aditi Raghunathan*, Ludwig Schmidt, Percy Liang, John C. Duchi. Advances in Neural Information Processing Systems (NeurIPS), 2019. [bib] [paper] [CodaLab]

Verified uncertainty calibration. Ananya Kumar, Percy Liang, Tengyu Ma. Advances in Neural Information Processing Systems (NeurIPS), 2019. [bib] [paper] [CodaLab]

On the accuracy of influence functions for measuring group effects. Pang Wei Koh*, Kai-Siang Ang*, Hubert H. K. Teo*, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2019. [bib] [paper] [CodaLab]

Learning autocomplete systems as a communication game. Mina Lee, Tatsunori Hashimoto, Percy Liang. Emergent Communication Workshop at Neural Information Processing Systems (NeurIPS), 2019. [bib] [paper] [CodaLab]

Adversarial training can hurt generalization. Aditi Raghunathan*, Sang Michael Xie*, Fanny Yang, John C. Duchi, Percy Liang. arXiv preprint arXiv:1906.06032, 2019. [bib] [paper]

Maximum weighted loss discrepancy. Fereshte Khani, Aditi Raghunathan, Percy Liang. arXiv preprint arXiv:1906.03518, 2019. [bib] [paper] [CodaLab]

Ambitious data science can be painless. Hatef Monajemi, Riccardo Murri, Eric Jonas, Percy Liang, Victoria Stodden, David L. Donoho. Harvard Data Science Review, 2019. [bib] [paper]

Unifying human and statistical evaluation for natural language generation. Tatsunori Hashimoto*, Hugh Zhang*, Percy Liang. North American Association for Computational Linguistics (NAACL), 2019. [bib] [paper] [CodaLab]

Pun generation with surprise. Nanyun Peng*, He He*, Percy Liang. North American Association for Computational Linguistics (NAACL), 2019. [bib] [paper] [CodaLab]

Learning a SAT solver from single-bit supervision. Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, David L. Dill. International Conference on Learning Representations (ICLR), 2019. [bib] [paper]

Defending against whitebox adversarial attacks via randomized discretization. Yuchen Zhang, Percy Liang. Artificial Intelligence and Statistics (AISTATS), 2019. [bib] [paper] [CodaLab]

Inferring multidimensional rates of aging from cross-sectional data. Emma Pierson, Pang Wei Koh, Tatsunori Hashimoto, Daphne Koller, Jure Leskovec, Nick Eriksson, Percy Liang. Artificial Intelligence and Statistics (AISTATS), 2019. [bib] [paper]

FrAngel: component-based synthesis with control structures. Kensen Shi, Jacob Steinhardt, Percy Liang. Principles of Programming Languages (POPL), 2019. [bib] [paper] [CodaLab]

2018

Semidefinite relaxations for certifying robustness to adversarial examples. Aditi Raghunathan, Jacob Steinhardt, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2018. [bib] [paper]

Uncertainty sampling is preconditioned stochastic gradient descent on zero-one loss. Stephen Mussmann, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2018. [bib] [paper] [CodaLab]

A retrieve-and-edit framework for predicting structured outputs. Tatsunori Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2018. [bib] [paper] [CodaLab]

QuAC: question answering in context. Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer. Empirical Methods in Natural Language Processing (EMNLP), 2018. [bib] [paper]

Decoupling strategy and generation in negotiation dialogues. He He, Derek Chen, Anusha Balakrishnan, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2018. [bib] [paper] [CodaLab]

Mapping natural language commands to web elements. Panupong Pasupat, Tian-Shun Jiang, Evan Zheran Liu, Kelvin Guu, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2018. Short paper. [bib] [paper] [CodaLab]

Textual analogy parsing: what's shared and what's compared among analogous facts. Matthew Lamm, Arun Chaganty, Christopher D. Manning, Dan Jurafsky, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2018. [bib] [paper]

On the relationship between data efficiency and error in active learning. Stephen Mussmann, Percy Liang. International Conference on Machine Learning (ICML), 2018. [bib] [paper] [CodaLab]

Fairness without demographics in repeated loss minimization. Tatsunori B. Hashimoto, Megha Srivastava, Hongseok Namkoong, Percy Liang. International Conference on Machine Learning (ICML), 2018. Best paper runner up. [bib] [paper] [CodaLab]

Training classifiers with natural language explanations. Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, Christopher Ré. Association for Computational Linguistics (ACL), 2018. [bib] [paper] [CodaLab]

The price of debiasing automatic metrics in natural language evaluation. Arun Chaganty, Stephen Mussmann, Percy Liang. Association for Computational Linguistics (ACL), 2018. [bib] [paper] [CodaLab]

Know what you don't know: unanswerable questions for SQuAD. Pranav Rajpurkar, Robin Jia, Percy Liang. Association for Computational Linguistics (ACL), 2018. Best short paper award. [bib] [paper] [CodaLab]

Generalized binary search for split-neighborly problems. Stephen Mussmann, Percy Liang. Artificial Intelligence and Statistics (AISTATS), 2018. [bib] [paper]

Planning, inference and pragmatics in sequential language games. Fereshte Khani, Noah D. Goodman, Percy Liang. Transactions of the Association for Computational Linguistics (TACL), 2018. [bib] [paper] [CodaLab]

Generating sentences by editing prototypes. Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang. Transactions of the Association for Computational Linguistics (TACL), 2018. [bib] [paper] [CodaLab]

Delete, retrieve, generate: a simple approach to sentiment and style transfer. Juncen Li, Robin Jia, He He, Percy Liang. North American Association for Computational Linguistics (NAACL), 2018. [bib] [paper] [CodaLab]

Reinforcement learning on web interfaces using workflow-guided exploration. Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, Percy Liang. International Conference on Learning Representations (ICLR), 2018. [bib] [paper]

Certified defenses against adversarial examples. Aditi Raghunathan, Jacob Steinhardt, Percy Liang. International Conference on Learning Representations (ICLR), 2018. [bib] [paper] [CodaLab]

Active learning of points-to specifications. Osbert Bastani, Rahul Sharma, Alex Aiken, Percy Liang. Programming Language Design and Implementation (PLDI), 2018. [bib] [paper]

Prediction with a short memory. Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant. Symposium on Theory of Computing (STOC), 2018. [bib] [paper]

Transforming question answering datasets into natural language inference datasets. Dorottya Demszky, Kelvin Guu, Percy Liang. arXiv preprint arXiv:1809.02922, 2018. [bib]

2017

Certified defenses for data poisoning attacks. Jacob Steinhardt, Pang Wei Koh, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2017. [bib] [paper] [CodaLab]

Unsupervised transformation learning via convex relaxations. Tatsunori B. Hashimoto, John Duchi, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2017. [bib] [paper] [CodaLab]

Learning overcomplete HMMs. Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant. Advances in Neural Information Processing Systems (NeurIPS), 2017. [bib] [paper]

Adversarial examples for evaluating reading comprehension systems. Robin Jia, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2017. Outstanding paper award. [bib] [paper] [CodaLab]

Macro grammars and holistic triggering for efficient semantic parsing. Yuchen Zhang, Panupong Pasupat, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2017. [bib] [paper] [CodaLab]

Importance sampling for unbiased on-demand evaluation of knowledge base population. Arun Chaganty, Ashwin Paranjape, Percy Liang, Chris Manning. Empirical Methods in Natural Language Processing (EMNLP), 2017. [bib] [paper]

Understanding black-box predictions via influence functions. Pang Wei Koh, Percy Liang. International Conference on Machine Learning (ICML), 2017. Best paper award. [bib] [paper] [CodaLab]

Convexified convolutional neural networks. Yuchen Zhang, Percy Liang, Martin J. Wainwright. International Conference on Machine Learning (ICML), 2017. [bib] [paper]

Developing bug-free machine learning systems with formal mathematics. Daniel Selsam, Percy Liang, David Dill. International Conference on Machine Learning (ICML), 2017. [bib] [paper] [code]

World of bits: an open-domain platform for web-based agents. Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, Percy Liang. International Conference on Machine Learning (ICML), 2017. [bib] [paper]

A hitting time analysis of stochastic gradient Langevin dynamics. Yuchen Zhang, Percy Liang, Moses Charikar. Conference on Learning Theory (COLT), 2017. Best paper award. [bib] [paper]

Synthesizing program input grammars. Osbert Bastani, Rahul Sharma, Alex Aiken, Percy Liang. Programming Language Design and Implementation (PLDI), 2017. [bib] [paper]

Naturalizing a programming language via interactive learning. Sida I. Wang, Sam Ginn, Percy Liang, Christopher D. Manning. Association for Computational Linguistics (ACL), 2017. [bib] [paper] [project] [CodaLab]

Learning symmetric collaborative dialogue agents with dynamic knowledge graph embeddings. He He, Anusha Balakrishnan, Mihail Eric, Percy Liang. Association for Computational Linguistics (ACL), 2017. [bib] [paper] [CodaLab]

From language to programs: bridging reinforcement learning and maximum marginal likelihood. Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, Percy Liang. Association for Computational Linguistics (ACL), 2017. [bib] [paper] [CodaLab]

2016

Unsupervised risk estimation using only conditional independence structure. Jacob Steinhardt, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2016. [bib] [paper]

SQuAD: 100,000+ questions for machine comprehension of text. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2016. Best resource paper award. [bib] [paper] [CodaLab]

Learning language games through interaction. Sida I. Wang, Percy Liang, Chris Manning. Association for Computational Linguistics (ACL), 2016. Outstanding paper award. [bib] [paper] [CodaLab]

Data recombination for neural semantic parsing. Robin Jia, Percy Liang. Association for Computational Linguistics (ACL), 2016. [bib] [paper] [CodaLab]

Simpler context-dependent logical forms via model projections. Reginald Long, Panupong Pasupat, Percy Liang. Association for Computational Linguistics (ACL), 2016. [bib] [paper] [CodaLab]

Inferring logical forms from denotations. Panupong Pasupat, Percy Liang. Association for Computational Linguistics (ACL), 2016. [bib] [paper] [CodaLab]

Unanimous prediction for 100% precision with application to learning semantic mappings. Fereshte Khani, Martin Rinard, Percy Liang. Association for Computational Linguistics (ACL), 2016. [bib] [paper] [CodaLab]

How much is 131 million dollars? Putting numbers in perspective with compositional descriptions. Arun Tejasvi Chaganty, Percy Liang. Association for Computational Linguistics (ACL), 2016. [bib] [paper] [CodaLab]

Estimation from indirect supervision with linear moments. Aditi Raghunathan, Roy Frostig, John Duchi, Percy Liang. International Conference on Machine Learning (ICML), 2016. [bib] [paper] [CodaLab]

Data augmentation via Lévy processes. Stefan Wager, Will Fithian, Percy Liang. Perturbations, Optimization and Statistics, 2016. [bib] [paper] [code]

Learning executable semantic parsers for natural language understanding. Percy Liang. Communications of the ACM, 2016. [bib] [paper]

2015

Building a semantic parser overnight. Yushi Wang, Jonathan Berant, Percy Liang. Association for Computational Linguistics (ACL), 2015. [bib] [paper] [project] [CodaLab]

Imitation learning of agenda-based semantic parsers. Jonathan Berant, Percy Liang. Transactions of the Association for Computational Linguistics (TACL), 2015. [bib] [paper] [project] [CodaLab]

Learning with relaxed supervision. Jacob Steinhardt, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2015. [bib] [paper] [CodaLab]

Estimating mixture models via mixture of polynomials. Sida I. Wang, Arun Chaganty, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2015. [bib] [paper] [CodaLab]

On-the-Job learning with Bayesian decision theory. Keenon Werling, Arun Chaganty, Percy Liang, Chris Manning. Advances in Neural Information Processing Systems (NeurIPS), 2015. [bib] [paper] [CodaLab]

Calibrated structured prediction. Volodymyr Kuleshov, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2015. [bib] [paper] [CodaLab]

Traversing knowledge graphs in vector space. Kelvin Guu, John Miller, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2015. Best paper honorable mention. [bib] [paper] [CodaLab]

Compositional semantic parsing on semi-structured tables. Panupong Pasupat, Percy Liang. Association for Computational Linguistics (ACL), 2015. [bib] [paper] [CodaLab]

Environment-Driven lexicon induction for high-level instructions. Dipendra K. Misra, Kejia Tao, Percy Liang, Ashutosh Saxena. Association for Computational Linguistics (ACL), 2015. [bib] [paper] [CodaLab]

Reified context models. Jacob Steinhardt, Percy Liang. International Conference on Machine Learning (ICML), 2015. [bib] [paper] [CodaLab]

Learning fast-mixing models for structured prediction. Jacob Steinhardt, Percy Liang. International Conference on Machine Learning (ICML), 2015. [bib] [paper] [CodaLab]

Learning where to sample in structured prediction. Tianlin Shi, Jacob Steinhardt, Percy Liang. Artificial Intelligence and Statistics (AISTATS), 2015. [bib] [paper] [CodaLab]

Tensor factorization via matrix factorization. Volodymyr Kuleshov, Arun Chaganty, Percy Liang. Artificial Intelligence and Statistics (AISTATS), 2015. [bib] [paper] [CodaLab]

Bringing machine learning and compositional semantics together. Percy Liang, Christopher Potts. Annual Reviews of Linguistics, 2015. [bib] [paper]

2014

The statistics of streaming sparse regression. Jacob Steinhardt, Stefan Wager, Percy Liang. arXiv preprint arXiv:1412.4182, 2014. [bib] [paper]

Linking people with "their" names using coreference resolution. Vignesh Ramanathan, Armand Joulin, Percy Liang, Li Fei-Fei. European Conference on Computer Vision (ECCV), 2014. [bib] [paper] [supplemental material]

Talking to computers in natural language. Percy Liang. XRDS: Crossroads, The ACM Magazine for Students, 2014. [bib] [paper]

Semantic parsing via paraphrasing. Jonathan Berant, Percy Liang. Association for Computational Linguistics (ACL), 2014. Best long paper honorable mention. [bib] [paper] [project]

Zero-shot entity extraction from web pages. Panupong Pasupat, Percy Liang. Association for Computational Linguistics (ACL), 2014. [bib] [paper] [slides] [project]

Estimating latent-variable graphical models using moments and likelihoods. Arun Chaganty, Percy Liang. International Conference on Machine Learning (ICML), 2014. [bib] [paper] [slides]

Adaptivity and optimism: an improved exponentiated gradient algorithm. Jacob Steinhardt, Percy Liang. International Conference on Machine Learning (ICML), 2014. [bib] [paper]

Filtering with abstract particles. Jacob Steinhardt, Percy Liang. International Conference on Machine Learning (ICML), 2014. [bib] [paper] [supplemental material]

Altitude training: strong bounds for single-layer dropout. Stefan Wager, Will Fithian, Sida I. Wang, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2014. [bib] [paper]

Simple MAP inference via low-rank relaxations. Roy Frostig, Sida I. Wang, Percy Liang, Chris Manning. Advances in Neural Information Processing Systems (NeurIPS), 2014. [bib] [paper] [CodaLab]

Relaxations for inference in restricted Boltzmann machines. Sida I. Wang, Roy Frostig, Percy Liang, Chris Manning. International Conference on Learning Representations Workshop (ICLR), 2014. [bib] [paper]

2013

Lambda dependency-based compositional semantics. Percy Liang. arXiv preprint arXiv:1309.4408, 2013. [bib] [paper]

Semantic parsing on Freebase from question-answer pairs. Jonathan Berant, Andrew Chou, Roy Frostig, Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2013. [bib] [paper] [supplemental material] [slides] [project]

Feature noising for log-linear structured prediction. Sida I. Wang, Mengqiu Wang, Stefan Wager, Percy Liang, Chris Manning. Empirical Methods in Natural Language Processing (EMNLP), 2013. [bib] [paper]

Dropout training as adaptive regularization. Stefan Wager, Sida I. Wang, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2013. [bib] [paper] [poster]

Spectral experts for estimating mixtures of linear regressions. Arun Chaganty, Percy Liang. International Conference on Machine Learning (ICML), 2013. [bib] [paper]

Video event understanding using natural language descriptions. Vignesh Ramanathan, Percy Liang, Li Fei-Fei. International Conference on Computer Vision (ICCV), 2013. [bib] [paper]

A data driven approach for algebraic loop invariants. Rahul Sharma, Saurabh Gupta, Bharath Hariharan, Alex Aiken, Percy Liang, Aditya V. Nori. European Symposium on Programming (ESOP), 2013. [bib] [paper]

2012

Identifiability and unmixing of latent parse trees. Daniel Hsu, Sham M. Kakade, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2012.

This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models.
```
@inproceedings{hsu12identifiability,
  author = {Daniel Hsu and Sham M. Kakade and Percy Liang},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  title = {Identifiability and Unmixing of Latent Parse Trees},
  year = {2012},
}
```
[abstract] [bib] [paper] [techreport] [poster]

2011

Learning dependency-based compositional semantics. Percy Liang, Michael I. Jordan, Dan Klein. Association for Computational Linguistics (ACL), 2011.

Compositional question answering begins by mapping questions to logical forms, but training a semantic parser to perform this mapping typically requires the costly annotation of the target logical forms. In this paper, we learn to map questions to answers via latent logical forms, which are induced automatically from question-answer pairs. In tackling this challenging learning problem, we introduce a new semantic representation which highlights a parallel between dependency syntax and efficient evaluation of logical forms. On two standard semantic parsing benchmarks (GEO and JOBS), our system obtains the highest published accuracies, despite using less supervision than existing systems.

Task: learn to map questions to answers via latent logical forms.
Contribution: new tree-based semantic representation.
Result: surpass state-of-the-art on semantic parsing with less supervision.
```
@inproceedings{liang11dcs,
  author = {Percy Liang and Michael I. Jordan and Dan Klein},
  booktitle = {Association for Computational Linguistics (ACL)},
  pages = {590--599},
  title = {Learning Dependency-Based Compositional Semantics},
  year = {2011},
}
```
[abstract] [brief] [bib] [paper] [thesis] [journal] [slides] [code]
Scaling up abstraction refinement via pruning. Percy Liang, Mayur Naik. Programming Language Design and Implementation (PLDI), 2011.

Many static analyses do not scale as they are made more precise. For example, increasing the amount of context sensitivity in a k-limited pointer analysis causes the number of contexts to grow exponentially with k. Iterative refinement techniques can mitigate this growth by starting with a coarse abstraction and only refining parts of the abstraction that are deemed relevant with respect to a given client.
In this paper, we introduce a new technique called pruning that uses client feedback in a different way. The basic idea is to use coarse abstractions to prune away parts of the program analysis deemed irrelevant for proving a client query, and then using finer abstractions on the sliced program analysis. For a k-limited pointer analysis, this approach amounts to adaptively refining and pruning a set of prefix patterns representing the contexts relevant for the client. By pruning, we are able to scale up to much more expensive abstractions than before. We also prove that the pruned analysis is both sound and complete, that is, it yields the same results as an analysis that uses a more expensive abstraction directly without pruning.

Idea: run cheap analysis, use client feedback to prune away irrelvant parts of program analysis (think program slicing); then run expensive analysis.
Theoretical result: pruning is sound and complete.
Empirical result: we can use much richer $k$-object-sensitivity abstractions.
```
@inproceedings{liang11pruning,
  author = {Percy Liang and Mayur Naik},
  booktitle = {Programming Language Design and Implementation (PLDI)},
  title = {Scaling up Abstraction Refinement via Pruning},
  year = {2011},
}
```
[abstract] [brief] [bib] [paper] [slides]
Learning minimal abstractions. Percy Liang, Omer Tripp, Mayur Naik. Principles of Programming Languages (POPL), 2011.

Static analyses are generally parametrized by an abstraction which is chosen from a family of abstractions. We are interested in flexible families of abstractions with many parameters, as these families can allow one to increase precision in ways tailored to the client without sacrificing scalability. For example, we consider k-limited points-to analyses where each call site and allocation site in a program can have a different k value. We then ask a natural question in this paper: What is the minimal (coarsest) abstraction in a given family which is able to prove a set of queries? In addressing this question, we make the following two contributions: (i) We introduce two machine learning algorithms for efficiently finding a minimal abstraction; and (ii) for a static race detector backed by a k-limited points-to analysis, we show empirically that minimal abstractions are actually quite coarse: It suffices to provide context/object sensitivity to a very small fraction (0.4--2.3%) of the sites to yield equally precise results as providing context/object sensitivity uniformly to all sites.

Question: how small is the smallest abstraction needed to prove a query?
Empirical answer: very small (less than 2.5% sites need to be treated context-sensitively for k-limited analyses for race detection).
Found this answer using a new machine learning algorithm that exploits this sparsity.
```
@inproceedings{liang11minimal,
  author = {Percy Liang and Omer Tripp and Mayur Naik},
  booktitle = {Principles of Programming Languages (POPL)},
  title = {Learning Minimal Abstractions},
  year = {2011},
}
```
[abstract] [brief] [bib] [paper] [slides]

2010

A game-theoretic approach to generating spatial descriptions. Dave Golland, Percy Liang, Dan Klein. Empirical Methods in Natural Language Processing (EMNLP), 2010. [bib] [paper] [slides]

A simple domain-independent probabilistic approach to generation. Gabor Angeli, Percy Liang, Dan Klein. Empirical Methods in Natural Language Processing (EMNLP), 2010.

We present a simple, robust generation system which performs content selection and surface realization in a unified, domain-independent framework. In our approach, we break up the end-to-end generation process into a sequence of local decisions, arranged hierarchically and each trained discriminatively. We deployed our system in three different domains---Robocup sportscasting, technical weather forecasts, and common weather forecasts, obtaining results comparable to state-of-the-art domain-specific systems both in terms of BLEU scores and human evaluation.

Model natural language generation as a sequence of local decisions, each backed by a log-linear model.
Advantage: can use arbitrary expressive features, works across multiple domains.
```
@inproceedings{angeli10generation,
  author = {Gabor Angeli and Percy Liang and Dan Klein},
  booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
  title = {A Simple Domain-Independent Probabilistic Approach to Generation},
  year = {2010},
}
```
[abstract] [brief] [bib] [paper] [slides]
A dynamic evaluation of static heap abstractions. Percy Liang, Omer Tripp, Mayur Naik, Mooly Sagiv. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2010.

The quality of a static analysis of heap-manipulating programs is largely determined by its heap abstraction. Object allocation sites are a commonly-used abstraction, but are too coarse for some clients. The goal of this paper is to investigate how various refinements of allocation sites can improve precision. In particular, we consider abstractions that use call stack, object recency, and heap connectivity information. We measure the precision of these abstractions dynamically for four different clients motivated by concurrency and on nine Java programs chosen from the DaCapo benchmark suite. Our dynamic results shed new light on aspects of heap abstractions that matter for precision, which allows us to more effectively navigate the large space of possible heap abstractions.

Question: what aspects of a heap abstraction matter?
Methodology: run program (9 DaCapo benchmarks) dynamically, compute static heap abstractions (3 dimensions of refinement: context sensitivity, object recency, and shape analysis), answer client queries (4 clients based on concurrency).
```
@inproceedings{liang10abstraction,
  author = {Percy Liang and Omer Tripp and Mayur Naik and Mooly Sagiv},
  booktitle = {Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA)},
  title = {A Dynamic Evaluation of Static Heap Abstractions},
  year = {2010},
}
```
[abstract] [brief] [bib] [paper] [slides]
Learning programs: a hierarchical Bayesian approach. Percy Liang, Michael I. Jordan, Dan Klein. International Conference on Machine Learning (ICML), 2010.

We are interested in learning programs for multiple related tasks given only a few training examples per task. Since the program for a single task is underdetermined by its data, we introduce a nonparametric hierarchical Bayesian prior over programs which shares statistical strength across multiple tasks. The key challenge is to parametrize this multi-task sharing. For this, we introduce a new representation of programs based on combinatory logic and provide an MCMC algorithm that can perform safe program transformations on this representation to reveal shared inter-program substructures.

Programs are trees, subprograms are subtrees, which can be shared across tasks. Combinators refactor programs to expose the appropriate subprograms.
```
@inproceedings{liang10programs,
  author = {Percy Liang and Michael I. Jordan and Dan Klein},
  booktitle = {International Conference on Machine Learning (ICML)},
  pages = {639--646},
  title = {Learning Programs: A Hierarchical {B}ayesian Approach},
  year = {2010},
}
```
[abstract] [brief] [bib] [paper] [slides] [code]
On the interaction between norm and dimensionality: multiple regimes in learning. Percy Liang, Nati Srebro. International Conference on Machine Learning (ICML), 2010.

A learning problem might have several measures of complexity (e.g., norm and dimensionality) that affect the generalization error. What is the interaction between these complexities? Dimension-free learning theory bounds and parametric asymptotic analyses each provide a partial picture of the full learning curve. In this paper, we use high-dimensional asymptotics on two classical problems---mean estimation and linear regression---to explore the learning curve more completely. We show that these curves exhibit multiple regimes, where in each regime, the excess risk is controlled by a subset of the problem complexities.

Goal: understand excess risk as a function of sample size and problem complexity. On simple examples, show that asymptotic risk has multiple regimes, each controlled by different complexities.
```
@inproceedings{liang10regimes,
  author = {Percy Liang and Nati Srebro},
  booktitle = {International Conference on Machine Learning (ICML)},
  title = {On the Interaction between Norm and Dimensionality: Multiple Regimes in Learning},
  year = {2010},
}
```
[abstract] [brief] [bib] [paper] [slides]
Type-Based MCMC. Percy Liang, Michael I. Jordan, Dan Klein. North American Association for Computational Linguistics (NAACL), 2010.

Most existing algorithms for learning latent-variable models---such as EM and existing Gibbs samplers---are token-based, meaning that they update the variables associated with one sentence at a time. The incremental nature of these methods makes them susceptible to local optima/slow mixing. In this paper, we introduce a type-based sampler, which updates a block of variables, identified by a type, which spans multiple sentences. We show improvements on part-of-speech induction, word segmentation, and learning tree-substitution grammars.

NLP perspective: goal is to avoid local optima by processing all tokens associated with a type at once instead of one token or sentence at a time.
Sampling perspective: new type of block sampling that exploits exchangeability.
```
@inproceedings{liang10type,
  author = {Percy Liang and Michael I. Jordan and Dan Klein},
  booktitle = {North American Association for Computational Linguistics (NAACL)},
  title = {Type-Based {MCMC}},
  year = {2010},
}
```
[abstract] [brief] [bib] [paper] [slides] [code]

2009

Asymptotically optimal regularization in smooth parametric models. Percy Liang, Francis Bach, Guillaume Bouchard, Michael I. Jordan. Advances in Neural Information Processing Systems (NeurIPS), 2009.

Many types of regularization schemes have been employed in statistical learning, each one motivated by some assumption about the problem domain. In this paper, we present a unified asymptotic analysis of smooth regularizers, which allows us to see how the validity of these assumptions impacts the success of a particular regularizer. In addition, our analysis motivates an algorithm for optimizing regularization parameters, which in turn can be analyzed within our framework. We apply our analysis to several examples, including hybrid generative-discriminative learning and multi-task learning.

Setting: estimator defined by minimizing loss plus regularization.
Question: what is the best regularizer to use?
This is hard to optimize, so use a Taylor expansion instead, yielding a interpretable closed form solution.
```
@inproceedings{liang09regularization,
  author = {Percy Liang and Francis Bach and Guillaume Bouchard and Michael I. Jordan},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  title = {Asymptotically Optimal Regularization in Smooth Parametric Models},
  year = {2009},
}
```
[abstract] [brief] [bib] [paper] [techreport] [poster]
Probabilistic grammars and hierarchical Dirichlet processes. Percy Liang, Michael I. Jordan, Dan Klein. The Oxford Handbook of Applied Bayesian Analysis, 2009.

Probabilistic context-free grammars (PCFGs) have played an important role in the modeling of syntax in natural language processing and other applications, but choosing the proper model complexity is often difficult. We present a nonparametric Bayesian generalization of the PCFG based on the hierarchical Dirichlet process (HDP). In our HDP-PCFG model, the effective complexity of the grammar can grow with increasing data. We describe an efficient variational inference algorithm for our model and present experiments on both a synthetic grammar induction task and a large-scale natural language parsing task.

Details of the EMNLP 2007 paper + general background, empirical intuitions, and derivations for structured mean-field + a small grammar induction experiment.
```
@incollection{liang09hdppcfg,
  author = {Percy Liang and Michael I. Jordan and Dan Klein},
  booktitle = {The Oxford Handbook of Applied Bayesian Analysis},
  title = {Probabilistic grammars and hierarchical {D}irichlet processes},
  year = {2009},
}
```
[abstract] [brief] [bib] [paper]
Learning semantic correspondences with less supervision. Percy Liang, Michael I. Jordan, Dan Klein. Association for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP), 2009.

A central problem in grounded language acquisition is learning the correspondences between a rich world state and a stream of text which references that world state. To deal with the high degree of ambiguity present in this setting, we present a generative model that simultaneously segments the text into utterances and maps each utterance to a meaning representation grounded in the world state. We show that our model generalizes across three domains of increasing difficulty---Robocup sportscasting, weather forecasts (a new domain), and NFL recaps.

Stuff happens in the world. A text talks about it. Our goal: learn the correspondence between the two.
Approach: probabilistic model capturing identification of entities/events in the world, segmentation of the text, and alignment between the two.
```
@inproceedings{liang09semantics,
  author = {Percy Liang and Michael I. Jordan and Dan Klein},
  booktitle = {Association for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP)},
  pages = {91--99},
  title = {Learning Semantic Correspondences with Less Supervision},
  year = {2009},
}
```
[abstract] [brief] [bib] [paper] [slides] [code]
Learning from measurements in exponential families. Percy Liang, Michael I. Jordan, Dan Klein. International Conference on Machine Learning (ICML), 2009.

Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints---both provide information about the desired model. In general, what is the most cost-effective way to learn? To address this question, we introduce measurements, a general class of mechanisms for providing information about a target model. We present a Bayesian decision-theoretic framework, which allows us to both integrate diverse measurements and choose new measurements to make. We use a variational inference algorithm, which exploits exponential family duality. The merits of our approach are demonstrated on two sequence labeling tasks.

Goal: learning with minimum human effort.
Things human can do: label data, provide constraints---in general, make measurements.
Use Bayesian decision theory to choose optimal measurements.
```
@inproceedings{liang09measurements,
  author = {Percy Liang and Michael I. Jordan and Dan Klein},
  booktitle = {International Conference on Machine Learning (ICML)},
  title = {Learning from Measurements in Exponential Families},
  year = {2009},
}
```
[abstract] [brief] [bib] [paper] [slides]
Online EM for unsupervised models. Percy Liang, Dan Klein. North American Association for Computational Linguistics (NAACL), 2009.

The (batch) EM algorithm plays an important role in unsupervised induction, but it sometimes suffers from slow convergence. In this paper, we show that online variants (1) provide significant speedups and (2) can even find better solutions than those found by batch EM. We support these findings on four unsupervised tasks: part-of-speech tagging, document classification, word segmentation, and word alignment.

What you'd expect: online is faster than batch.
What you might not expect: online gets better accuarcy than batch.
```
@inproceedings{liang09online,
  author = {Percy Liang and Dan Klein},
  booktitle = {North American Association for Computational Linguistics (NAACL)},
  pages = {611--619},
  title = {Online {EM} for Unsupervised Models},
  year = {2009},
}
```
[abstract] [brief] [bib] [paper] [slides] [code]

2008

An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators. Percy Liang, Michael I. Jordan. International Conference on Machine Learning (ICML), 2008. Best student paper award.

Statistical and computational concerns have motivated parameter estimators based on various forms of likelihood, e.g., joint, conditional, and pseudolikelihood. In this paper, we present a unified framework for studying these estimators, which allows us to compare their relative (statistical) efficiencies. Our asymptotic analysis suggests that modeling more of the data tends to reduce variance, but at the cost of being more sensitive to model misspecification. We present experiments validating our analysis.

Derive general expression for the asymptotic risk of composite likelihood estimators in exponential families.
This allows us to compare the various estimators.
```
@inproceedings{liang08asymptotics,
  author = {Percy Liang and Michael I. Jordan},
  booktitle = {International Conference on Machine Learning (ICML)},
  pages = {584--591},
  title = {An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators},
  year = {2008},
}
```
[abstract] [brief] [bib] [paper] [slides]
Structure compilation: trading structure for features. Percy Liang, Hal {Daum{é} III}, Dan Klein. International Conference on Machine Learning (ICML), 2008.

Structured models often achieve excellent performance but can be slow at test time. We investigate structure compilation, where we replace structure with features, which are often computationally simpler but unfortunately statistically more complex. We analyze this tradeoff theoretically and empirically on three natural language processing tasks. We also introduce a simple method to transfer predictive power from structure to features via unlabeled data, while incurring a minimal statistical penalty.

How much do we lose by throwing out edge features in CRFs and adding node features?
Studies the approximation, estimation, computational aspects of the tradeoff.
```
@inproceedings{liang08structure,
  author = {Percy Liang and Hal {Daum{é} III} and Dan Klein},
  booktitle = {International Conference on Machine Learning (ICML)},
  title = {Structure Compilation: Trading Structure for Features},
  year = {2008},
}
```
[abstract] [brief] [bib] [paper] [slides]
Analyzing the errors of unsupervised learning. Percy Liang, Dan Klein. Human Language Technology and Association for Computational Linguistics (HLT/ACL), 2008.

We identify four types of errors that unsupervised induction systems make and study each one in turn. Our contributions include (1) using a meta-model to analyze the incorrect biases of a model in a systematic way, (2) providing an efficient and robust method of measuring distance between two parameter settings of a model, and (3) showing that local optima issues which typically plague EM can be somewhat alleviated by increasing the number of training examples. We conduct our analyses on three models: the HMM, the PCFG, and a simple dependency model.

Error decomposition: approximation, identifiability, estimation, optimization errors.
Used meta-model to analyze approximation error.
Empirically observed that more data reduces optimization error.
```
@inproceedings{liang08errors,
  author = {Percy Liang and Dan Klein},
  booktitle = {Human Language Technology and Association for Computational Linguistics (HLT/ACL)},
  title = {Analyzing the Errors of Unsupervised Learning},
  year = {2008},
}
```
[abstract] [brief] [bib] [paper] [slides]
Learning bilingual lexicons from monolingual corpora. Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick, Dan Klein. Human Language Technology and Association for Computational Linguistics (HLT/ACL), 2008.

We present a method for learning bilingual translation lexicons from monolingual corpora. Word types in each language are characterized by purely monolingual features, such as context counts and orthographic substrings. Translations are induced using a generative model based on canonical correlation analysis, which explains the monolingual lexicons in terms of latent matchings. We show that high-precision lexicons can be learned in a variety of language pairs and from a range of corpus types.

By using CCA, can do word alignment without the usual sentence-aligned corpora.
```
@inproceedings{haghighi08lexicon,
  author = {Aria Haghighi and Percy Liang and Taylor Berg-Kirkpatrick and Dan Klein},
  booktitle = {Human Language Technology and Association for Computational Linguistics (HLT/ACL)},
  title = {Learning Bilingual Lexicons from Monolingual Corpora},
  year = {2008},
}
```
[abstract] [brief] [bib] [paper] [code]
Agreement-Based learning. Percy Liang, Dan Klein, Michael I. Jordan. Advances in Neural Information Processing Systems (NeurIPS), 2008.

The learning of probabilistic models with many hidden variables and non-decomposable dependencies is an important and challenging problem. In contrast to traditional approaches based on approximate inference in a single intractable model, our approach is to train a set of tractable submodels by encouraging them to agree on the hidden variables. This allows us to capture non-decomposable aspects of the data while still maintaining tractability. We propose an objective function for our approach, derive EM-style algorithms for parameter estimation, and demonstrate their effectiveness on three challenging real-world learning tasks.

Setting: unsupervised learning.
Alternative to approximate inference: make two tractable models and train them to agree.
Advantage: maintain existing tractable inference procedures as black-boxes.
```
@inproceedings{liang08agreement,
  author = {Percy Liang and Dan Klein and Michael I. Jordan},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  title = {Agreement-Based Learning},
  year = {2008},
}
```
[abstract] [brief] [bib] [paper] [poster]
A probabilistic approach to language change. Alexandre Bouchard-Côté, Percy Liang, Tom Griffiths, Dan Klein. Advances in Neural Information Processing Systems (NeurIPS), 2008.

We present a probabilistic approach to language change in which word forms are represented by phoneme sequences that undergo stochastic edits along the branches of a phylogenetic tree. This framework combines the advantages of the classical comparative method with the robustness of corpus-based probabilistic models. We use this framework to explore the consequences of two different schemes for defining probabilistic models of phonological change, evaluating these schemes by reconstructing ancient word forms of Romance languages. The result is an efficient inference procedure for automatically inferring ancient word forms from modern languages, which can be generalized to support inferences about linguistic phylogenies.

Feature-based generative model of phonemes of words in a phylogeny of languages.
```
@inproceedings{bouchard08language,
  author = {Alexandre Bouchard-Côté and Percy Liang and Tom Griffiths and Dan Klein},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  title = {A Probabilistic Approach to Language Change},
  year = {2008},
}
```
[abstract] [brief] [bib] [paper] [poster]

2007

Structured Bayesian nonparametric models with variational inference (tutorial). Percy Liang, Dan Klein. Association for Computational Linguistics (ACL), 2007. [bib] [paper] [slides]

A permutation-augmented sampler for Dirichlet process mixture models. Percy Liang, Michael I. Jordan, Ben Taskar. International Conference on Machine Learning (ICML), 2007.

We introduce a new inference algorithm for Dirichlet process mixture models. While Gibbs sampling and variational methods focus on local moves, the new algorithm makes more global moves. This is done by introducing a permutation of the data points as an auxiliary variable. The algorithm is a blocked sampler which alternates between sampling the clustering and sampling the permutation. The key to the efficiency of this approach is that it is possible to use dynamic programming to consider all exponentially many clusterings consistent with a given permutation. We also show that random projections can be used to effectively sample the permutation. The result is a stochastic hill-climbing algorithm that yields burn-in times significantly smaller than those of collapsed Gibbs sampling.

Task: clustering.
Idea: conditioned on a permutation of the data points, one can consider all possible clusterings of those data points which are consistent with the permutation using dynamic programming.
Idea: if data are ordered, can cluster using dynamic programming.
Let this ordering be a random auxiliary variable and we get a sampler.
```
@inproceedings{liang07permdp,
  author = {Percy Liang and Michael I. Jordan and Ben Taskar},
  booktitle = {International Conference on Machine Learning (ICML)},
  title = {A permutation-augmented sampler for {D}irichlet process mixture models},
  year = {2007},
}
```
[abstract] [brief] [bib] [paper] [slides]
The infinite PCFG using hierarchical Dirichlet processes. Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein. Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL), 2007.

We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDP-PCFG model allows the complexity of the grammar to grow as more training data is available. In addition to presenting a fully Bayesian model for the PCFG, we also develop an efficient variational inference procedure. On synthetic data, we recover the correct grammar without having to specify its complexity in advance. We also show that our techniques can be applied to full-scale parsing applications by demonstrating its effectiveness in learning state-split grammars.

A PCFG with an infinite number of states.
Learning: variational inference.
```
@inproceedings{liang07infpcfg,
  author = {Percy Liang and Slav Petrov and Michael I. Jordan and Dan Klein},
  booktitle = {Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL)},
  title = {The Infinite {PCFG} using Hierarchical {D}irichlet Processes},
  year = {2007},
}
```
[abstract] [brief] [bib] [paper] [slides]
A probabilistic approach to diachronic phonology. Alexandre Bouchard-Côté, Percy Liang, Tom Griffiths, Dan Klein. Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL), 2007.

We present a probabilistic model of diachronic phonology in which individual word forms undergo stochastic edits along the branches of a phylogenetic tree. Our approach allows us to achieve three goals with a single unified model: (1) reconstruction of both ancient and modern word forms, (2) discovery of general phonological changes, and (3) selection among different phylogenies. We learn our model using a Monte Carlo EM algorithm and present quantitative results validating the model.

Generative model of phonemes of words in a phylogeny of languages
```
@inproceedings{bouchard07diachronic,
  author = {Alexandre Bouchard-Côté and Percy Liang and Tom Griffiths and Dan Klein},
  booktitle = {Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL)},
  title = {A Probabilistic Approach to Diachronic Phonology},
  year = {2007},
}
```
[abstract] [brief] [bib] [paper]

2006

An end-to-end discriminative approach to machine translation. Percy Liang, Alexandre Bouchard-Côté, Dan Klein, Ben Taskar. International Conference on Computational Linguistics and Association for Computational Linguistics (COLING/ACL), 2006.

We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of learned features in all stages of decoding. We first discuss several challenges to error-driven discriminative approaches. In particular, we explore different ways of updating parameters given a training example. We find that making frequent but smaller updates is preferable to making fewer but larger updates. Then, we discuss an array of features and show both how they quantitatively increase BLEU score and how they qualitatively interact on specific examples. One particular feature we investigate is a novel way to introduce learning into the initial phrase extraction process, which has previously been entirely heuristic.

Task: machine translation.
Idea: treat machine translation as a structured classification task (learn a map from input sentence to output sentence). Use a Perceptron-like algorithm: decode and update towards maximum BLEU scoring translation on the n-best list.
```
@inproceedings{liang06discrimative,
  author = {Percy Liang and Alexandre Bouchard-Côté and Dan Klein and Ben Taskar},
  booktitle = {International Conference on Computational Linguistics and Association for Computational Linguistics (COLING/ACL)},
  title = {An End-to-End Discriminative Approach to Machine Translation},
  year = {2006},
}
```
[abstract] [brief] [bib] [paper] [slides]
Alignment by agreement. Percy Liang, Ben Taskar, Dan Klein. North American Association for Computational Linguistics (NAACL), 2006.

We present an unsupervised approach to symmetric word alignment in which two simple asymmetric models are trained jointly to maximize a combination of data likelihood and agreement between the models. Compared to the standard practice of intersecting predictions of independently-trained models, joint training provides a 32% reduction in AER. Moreover, a simple and efficient pair of HMM aligners provides a 29% reduction in AER over symmetrized IBM model 4 predictions.

Task: unsupervised word alignment.
Idea: Jointly train two HMM models (one in each direction) to encourage agreement. Uses a simple EM-like algorithm for training.
Result: performance competitive with supervised methods (4.9 AER on Hansards).
```
@inproceedings{liang06alignment,
  author = {Percy Liang and Ben Taskar and Dan Klein},
  booktitle = {North American Association for Computational Linguistics (NAACL)},
  pages = {104--111},
  title = {Alignment by Agreement},
  year = {2006},
}
```
[abstract] [brief] [bib] [paper] [slides] [code]

2005

Semi-Supervised learning for natural language. Percy Liang. Massachusetts Institute of Technology, 2005.

Task: named-entity recognition and Chinese word segmentation
Idea: create features based on unlabeled data to use in Perceptron learning in Markov or semi-Markov models
```
@mastersthesis{liang05meng,
  author = {Percy Liang},
  school = {Massachusetts Institute of Technology},
  title = {Semi-Supervised Learning for Natural Language},
  year = {2005},
}
```
[brief] [bib] [paper] [errata]
A data structure for maintaining acyclicity in hypergraphs. Percy Liang, Nathan Srebro. Massachusetts Institute of Technology Technical Report, 2005.

We introduce the first definition of hyperacyclicity for hypergraphs, a generalization of acyclicity in graphs.
We provide a dynamic data structure for maintaining hyperacyclicity, a generalization of Tarjan's Union-Find algorithm.
```
@techreport{liang05hypercycle,
  author = {Percy Liang and Nathan Srebro},
  institution = {Massachusetts Institute of Technology},
  title = {A Data Structure for Maintaining Acyclicity in Hypergraphs},
  year = {2005},
}
```
[brief] [bib] [paper] [code]

Linear programming in bounded tree-width Markov networks. Percy Liang, Nathan Srebro. Mathematical Programing for Data Mining and Machine Learning Workshop at McMaster University, 2005. [bib] [slides] [code]

Efficient geometric algorithms for parsing in two dimensions. Percy Liang, Mukund Narasimhan, Michael Shilman, Paul Viola. International Conference on Document Analysis and Recognition (ICDAR), 2005.

In parsing sequences using dynamic programming, the subproblems are continguous subsequences (quadratic in number of terminals). In parsing documents or images, the subproblems would be subsets of the terminals (exponential in number of terminals). We introduce (and unify) several ways to constrain these subsets using the geometric structure of the terminals.
```
@inproceedings{liang05geometric,
  author = {Percy Liang and Mukund Narasimhan and Michael Shilman and Paul Viola},
  booktitle = {International Conference on Document Analysis and Recognition (ICDAR)},
  title = {Efficient Geometric Algorithms for Parsing in Two Dimensions},
  year = {2005},
}
```
[brief] [bib] [paper]

2004

Methods and experiments with bounded tree-width Markov networks. Percy Liang, Nathan Srebro. Massachusetts Institute of Technology Technical Report, 2004.

Use a greedy procedure to find the maximum likelihood (or MDL) bounded tree-width Markov network (for tree-width 1, equivalent to Chow-Liu maximum spanning trees).
```
@techreport{liang04markov,
  author = {Percy Liang and Nathan Srebro},
  institution = {Massachusetts Institute of Technology},
  title = {Methods and Experiments With Bounded Tree-width {M}arkov Networks},
  year = {2004},
}
```
[brief] [bib] [paper] [code]

2003

How much of a hypertree can be captured by windmills? Percy Liang, Nathan Srebro. Massachusetts Institute of Technology Technical Report, 2003.

Use linear programming to find worst case inputs to a dynamic program in order to explore the tightness of a bound for approximating maximum weight hypertrees with windmill farms.
```
@techreport{liang03maxwmfarm,
  author = {Percy Liang and Nathan Srebro},
  institution = {Massachusetts Institute of Technology},
  title = {How Much Of A Hypertree Can Be Captured By Windmills?},
  year = {2003},
}
```
[brief] [bib] [paper] [code]