Research Topics

(Updated August 2025)

Research Summary: The Stanford Trustworthy AI Research (STAIR) group develops measurement-theoretic foundations for trustworthy AI systems. Our research spans three interconnected themes: (i) AI Measurement Science - developing frameworks for predictable AI evaluation and scaling laws, (ii) Trustworthy AI and Society - including algorithmic fairness, privacy, and learning from human preferences, and (iii) Applications and Real-World Impact - with implementations in healthcare, policy, and industry. Our work bridges rigorous scientific understanding with practical societal needs.

AI Measurement Science

Traditional approaches to AI evaluation often fail to capture real-world performance and can lead to misleading conclusions about system capabilities. We are developing new measurement frameworks that provide more reliable and predictable assessments of AI systems.

Predictability and Scaling Laws

Are Emergent Abilities of Large Language Models a Mirage?
Rylan Schaeffer, Brando Miranda, Sanmi Koyejo
Neural Information Processing Systems (NeurIPS), 2023 - Outstanding Paper Award
[preprint]
How Do Large Language Monkeys Get Their Power (Laws)?
Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo
International Conference on Machine Learning (ICML), 2025
[preprint]
Scaling laws for downstream task performance of large language models
Berivan Isik, Natalia Ponomareva, Hussein Hazimeh, Dimitris Paparas, Sergei Vassilvitskii, Sanmi Koyejo
International Conference on Learning Representations (ICLR), 2025
[preprint]

Benchmarks and Evaluation Science

Reliable and Efficient Amortized Model-based Evaluation
Sang T. Truong, Yuheng Tu, Percy Liang, Bo Li, Sanmi Koyejo
International Conference on Machine Learning (ICML), 2025
[preprint]
Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes
Zachary Robertson, Sanmi Koyejo
Preprint, 2025
[preprint]
Toward an evaluation science for generative AI systems
Laura Weidinger, Inioluwa Deborah Raji, Hanna Wallach, Margaret Mitchell, Angelina Wang, Olawale Salaudeen, Rishi Bommasani, Deep Ganguli, Sanmi Koyejo, William Isaac
National Academy of Engineering: Spring Bridge on AI, 2025
[preprint]
DecodingTrust: A comprehensive assessment of trustworthiness in GPT models
Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, et al.
Neural Information Processing Systems (NeurIPS), 2023 - Outstanding Paper Award
[preprint]

Trustworthy AI and Society

Ensuring AI systems work reliably and fairly for all communities requires addressing fundamental challenges in alignment, fairness, security, and privacy. We develop principled approaches to learning from human preferences, measuring and mitigating algorithmic bias, and protecting sensitive information in AI systems.

Alignment and Learning from Preferences

Machine Learning from Human Preferences
Sang Truong, Andreas Haupt, Sanmi Koyejo
Stanford Living Textbook, 2025
[website]
Cooperative inverse decision theory for uncertain preferences
Zachary Robertson, Hantao Zhang, Sanmi Koyejo
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
[link]
Transforming and combining rewards for aligning large language models
Zihao Wang, Chirag Nagpal, Jonathan Berant, Jacob Eisenstein, Alexander D'Amour, Sanmi Koyejo, Victor Veitch
International Conference on Machine Learning (ICML), 2024
[preprint]

Algorithmic Fairness and Equity

Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs
Angelina Wang, Michelle Phan, Daniel E. Ho, Sanmi Koyejo
Association for Computational Linguistics (ACL), 2025 - Outstanding Paper Award
[url]
Fairness with Overlapping Groups
Forest Yang, Moustapha Cisse, Sanmi Koyejo
Neural Information Processing Systems (NeurIPS), 2020
[preprint]
Fair Performance Metric Elicitation
Gaurush Hiranandani, Harikrishna Narasimhan, Oluwasanmi Koyejo
Neural Information Processing Systems (NeurIPS), 2020
[preprint]
Adapting to latent subgroup shifts via concepts and proxies
Ibrahim Alabdulmohsin, Nicole Chiou, Alexander D'Amour, Arthur Gretton, Sanmi Koyejo, Matt Kusner, Stephen Pfohl, Olawale Salaudeen, Jessica Schrouff, Katherine Tsai
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
[preprint]

Privacy and Machine Unlearning

Certified unlearning for neural networks
Anastasia Koloskova, Youssef Allouah, Animesh Jha, Rachid Guerraoui, Sanmi Koyejo
International Conference on Machine Learning (ICML), 2025
[preprint]
Rethinking machine unlearning for large language models
Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu
Nature Machine Intelligence, 2025
[preprint]

Robustness and Domain Generalization

Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?
Olawale Elijah Salaudeen, Nicole Chiou, Shiny Weng, Sanmi Koyejo
Transactions on Machine Learning Research, 2025
[preprint]
Causally inspired regularization enables domain general representations
Olawale Salaudeen, Sanmi Koyejo
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
[preprint]
Zenops: A distributed learning system integrating communication efficiency and security
Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
Algorithms, 2022
[url]

Applications and Real-World Impact

A test of trustworthy AI research is its utility for real-world deployments. Our work produces measurable impact across healthcare, policy development, and industry applications, demonstrating how measurement-theoretic approaches translate into practical benefits for society.

Healthcare and Clinical AI

Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review
Suhana Bedi, Yutong Liu, Lucy Orr-Ewing, Dev Dash, Sanmi Koyejo, Alison Callahan, Jason A. Fries, Michael Wornow, Akshay Swaminathan, Lisa Soleymani Lehmann, Hyo Jung Hong, Mehr Kashyap, Akash R. Chaurasia, Nirav R. Shah, Karandeep Singh, Troy Tazbaz, Arnold Milstein, Michael A. Pfeffer, Nigam H. Shah
JAMA, 2025
[url]
Toward fairness in artificial intelligence for medical image analysis: identification and mitigation of potential biases in the roadmap from data collection to model deployment
Karen Drukker, Weijie Chen, Judy Gichoya, Nicholas Gruszauskas, Jayashree Kalpathy-Cramer, Sanmi Koyejo, Kyle Myers, Rui C. Sá, Berkman Sahiner, Heather Whitney, Zi Zhang, Maryellen Giger
Journal of Medical Imaging, 2023
[url]
Opportunistic Detection of Type 2 Diabetes using Deep Learning from Frontal Chest Radiographs
Ayis Pyrros, Stephen M. Borstelmann, Ramana Mantravadi, et al.
Nature Communications, 2023
[url]
Diagnosing failures of fairness transfer across distribution shift in real-world medical settings
Jessica Schrouff, Natalie Harris, Sanmi Koyejo, Ibrahim Alabdulmohsin, Eva Schnider, et al.
Neural Information Processing Systems (NeurIPS), 2022
[preprint]

AI Policy and Governance

Shaping AI's impact on billions of lives
Mariano-Florentino Cuéllar, Jeff Dean, Finale Doshi-Velez, John Hennessy, Andy Konwinski, Sanmi Koyejo, Pelonomi Moiloa, Emma Pierson, David Patterson
Communications of the ACM, 2025
[preprint]
Advancing science-and evidence-based AI policy
Rishi Bommasani, Sanjeev Arora, Jennifer Chayes, Yejin Choi, Mariano-Florentino Cuéllar, Li Fei-Fei, Daniel E Ho, Dan Jurafsky, Sanmi Koyejo, Hima Lakkaraju, et al.
Science, 2025
[url]
Exploring the impact of AI on Black americans: Considerations for the congressional Black caucus policy initiatives
Nina Dewi Toft Djanegara, Daniel Zhang, Haifa Badi Uz Zaman, Caroline Meinhardt, Gelyn Watkins, Ezinne Nwankwo, Russell Wald, Rohini Kosoglu, Sanmi Koyejo, Michele Elam
Policy Brief, 2024
[url]

Selected Earlier Research

Our foundational work established key methodologies that continue to inform current research directions. These earlier contributions developed essential tools for metric elicitation, robust distributed learning, probabilistic modeling of complex data, and interpretable machine learning systems.

Metric Elicitation: Developing formal interactive strategies for discovering appropriate evaluation metrics that capture stakeholder preferences, with applications to ML fairness.

Performance metric elicitation from pairwise classifier comparisons
Gaurush Hiranandani, Shant Boodaghians, Ruta Mehta, Oluwasanmi Koyejo
International Conference on Artificial Intelligence and Statistics (AISTATS), 2019
[preprint]

Robust Federated Learning: Byzantine-robust algorithms for distributed machine learning that maintain privacy while defending against adversarial participants.

Zeno: Robust Asynchronous SGD with an Arbitrary Number of Byzantine Workers
Cong Xie, Sanmi Koyejo, Indranil Gupta
International Conference on Machine Learning (ICML), 2020
[preprint]

Probabilistic Graphical Models: Methods for modeling spatio-temporal data with applications to brain connectivity and neuroimaging.

Estimating Differential Latent Variable Graphical Models with Applications to Brain Connectivity
Sen Na, Mladen Kolar, Oluwasanmi Koyejo
Biometrika, 2020
[preprint]
*Latent Multimodal Functional Graphical Model Estimation
Katherine Tsai, Boxin Zhao, Sanmi Koyejo, Mladen Kolar
Journal of the American Statistical Association (JASA), 2024
[preprint]

Interpretable Machine Learning: Exemplar-based interpretable modeling and structured sparsity approaches for transparency in decision-making.

Examples are not Enough, Learn to Criticize! Criticism for Interpretable Machine Learning
Been Kim, Rajiv Khanna, Oluwasanmi Koyejo*
Advances in Neural Information Processing Systems (NIPS), 2016 (Oral)
[url]

Funding: We gratefully acknowledge support from the National Science Foundation, National Institutes of Health, Google Research, Microsoft Research, OpenAI, Stanford Human-Centered AI, Alfred P. Sloan Foundation, John D. and Catherine T. MacArthur Foundation, and Schmidt Sciences. Computing support provided by Microsoft Azure, Google Cloud, and NCSA.