About Me

I am currently a PostDoc at Stanford working with Chris Ré in the Hazy Research Lab. In August of 2019, I graduated with a PhD from Paul G Allen School for Computer Science and Engineering at the University of Washington in Seattle. I was part of the Database Group and advised by Dan Suciu and Magdalena Balazinska.

For my undergraduate degree, I went to Carleton College in Northfield, MN, where the city's motto is "Cows, Colleges, and Contentment" and graduated in 2013 as a Computer Science and Mathematics double major.

Research and Work Experience

My research interests are broadly at the intersection of machine learning and data management. I focus on how to manage the end-to-end lifecycle of self-supervised embedding pipelines. This includes problems of how to better train, maintain, monitor, and patch the embedding models and their use downstream.

I am a 2020 winner of the IC Postdoc Research Fellowship Program and am one of the 2015 winners of the NSF GRFP in Computer Science. In the summer of 2016 and 2017, I interned at Microsoft Research as a PhD research intern, and in the summer of 2015, I interned at Tableau as a software developer. From the summer of 2012 to the spring of 2015, I interned at Sandia National Laboratories working on high performance computing and image reconstruction.


Misral-A Journey Torwards Reproducible Language Model Training. Laurel Orr* and Siddharth Karamcheti*.
Team: Jason Bolton, Tianyi Zhang, Karan Goel, Avanika Narayan, Rishi Bommasani, Deepak Narayanan
Advisors: Tatsunori Hashimoto, Dan Jurafsky, Christopher D. Manning, Christopher Potts, Christopher Ré, Percy Liang
[blog], [talk]

Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation. Laurel Orr, Megan Leszczynski, Simran Arora, Neel Guha, Xiao Ling, Sen Wu, and Christopher Ré


Ask Me Anything: A simple strategy for prompting language models. Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré. arXiv 2022.

Can Foundation Models Wrangle Your Data? Avanika Narayan, Ines Chami, Laurel Orr, Christopher Ré. arXiv 2022.

Data Management Opportunities for Foundation Models. Laurel Orr, Karan Goel, Christopher Ré. CIDR 2022.

On the Opportunities and Risks of Foundation Models (Lead of Data Section). Laurel Orr, Simran Arora, Karan Goel, Avanika Narayan, Michael Zhang, Christopher Ré. arXiv 2021.

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text. Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling, Christopher Ré. EMNLP 2021.

Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems (Tutorial). Laurel Orr, Atindriyo Sanyal, Xiao Ling, Karan Goel, Megan Leszczynski. VLDB 2021.
[paper], [slides]

Goodwill Hunting: Analyzing and Repurposing Off-the-Shelf Named Entity Linking Systems. Karan Goel, Laurel Orr, Nazneen Fatema Rajani, Jesse Vig, Christopher Ré. NAACL Industry 2021.

Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation. Laurel Orr*, Megan Leszczynski*, Simran Arora, Sen Wu, Neel Guha, Xiao Ling, Christopher Ré. CIDR 2021.
[paper], [talk]

Mosaic: A Sample-Based Database System for Open World Query Processing. Laurel Orr, Samuel Ainsworth, Walter Cai, Kevin Jamieson, Magda Balazinska, Dan Suciu. CIDR 2020.

Sample Debiasing in the Themis Open World Database System. Laurel Orr, Magdalena Balazinska, and Dan Suciu. SIGMOD 2020.

Pushing Data-Induced Predicates Through Joins in Big-Data Clusters. Srikanth Kandula, Laurel Orr, and Surajit Chaudhuri. VLDB 2019.

EntropyDB: A Probabilistic Approach to Approximate Query Processing. Laurel Orr, Magdalena Balazinska, and Dan Suciu. VLDB Journal 2019.

Probabilistic Database Summarization for Interactive Data Exploration. Laurel Orr, Magdalena Balazinska, and Dan Suciu. VLDB 2017.

Explaining Query Answers with Explanation-Ready Databases. Sudeepa Roy, Laurel Orr, and Dan Suciu. VLDB 2015.

Big-Data Management Use-Case: A Cloud Service for Creating and Analyzing Galactic Merger Trees. S. Loebman, J. Ortiz, L. Choo, L. Orr, L. Anderson, D. Halperin, M. Balazinska, T. Quinn, F. Governato. SIGMOD Workshop on Data Analytics in the Cloud (DanaC) 2014.

Cluster-Based Approach to a Multi-GPU CT Reconstruction Algorithm. Laurel J. Orr, Edward S. Jimenez, Kyle R. Thompson. Conference Proceedings for the IEEE Nuclear Science Symposium and Medical Imaging Conference 2014.

Preparing for the 100-Megapixel Detector: Reconstruction a Multi-Terabyte Computed Tomography Dataset. Laurel J. Orr, and Edward S. Jimenez. Conference Proceedings for the Penetrating Radiation Systems and Applications XIV Workshop at the SPIE International Symposium on SPIE Optical Engineering+Applications 2013.