Research Topics(Updated Nov 2023) Research Summary: The Stanford Trustworthy AI Research (STAIR) group works to develop the principles and practice of trustworthy machine learning. Some recent highlights include (i) robust federated machine learning, and (ii) metric elicitation; selecting more effective machine learning metrics via human interaction, primarily applied to ML fairness. Our applied research includes applications to cognitive neuroimaging, healthcare, and biomedical imaging. Some recent highlights include (i) generative models, and (ii) risk-scoring and prediction models for X-rays and fMRI. (Robust) Distributed and Federated Machine LearningDistributed data-centers and devices such as smart cars, smartphones, wearable devices, and smart sensors increasingly collect massive and diverse data. To this end, there is a growing interest in training machine learning models jointly across data centers without explicitly sharing data. Along similar lines, there is a trend towards on-device training of machine-learning models jointly across edge devices. However, despite some obvious benefits, distributed training and federated learning create new challenges for private and secure machine learning, as distributed devices are more susceptible to new privacy and security attacks. We are developing novel algorithmic and computational approaches to ensure the privacy and security of federated and distributed machine learning. CSER: Communication-efficient SGD with Error Reset Learning with Complex MetricsReal-world machine learning often requires sophisticated evaluation metrics, many of which are non-decomposable, e.g., AUC, F-measure. This is in contrast to decomposable metrics such as accuracy, which can be computed as an empirical average. Indeed, non-decomposability is the primary source of difficulty in designing efficient algorithms that can optimize complex metrics. We study predictive methods from first principles and derive novel, efficient, statistically consistent algorithms that improve empirical performance. Fairness with Overlapping Groups Metric ElicitationWhat metric (equiv. cost function, loss function) should a machine learning model optimize? Selecting a suitable metric for real-world machine learning applications remains an open problem, as default metrics such as classification accuracy often do not capture tradeoffs relevant to downstream decision-making. Unfortunately, there needs to be more formal guidance in the machine learning literature on selecting appropriate metrics. We are developing formal interactive strategies by which a practitioner may discover which metric to optimize, such that it recovers user or expert preferences. We are particularly interested in applications to ML fairness. Quadratic Metric Elicitation with Application to Fairness Probabilistic Graphical Models for Spatio-temporal DataSpatio-temporal data are ubiquitous in science and engineering applications. We are pursuing various techniques for modeling such datasets, mainly using probabilistic graphical models and other graph-based analyses. We primarily use these tools to enable the scientific study and predictive modeling of brain networks. Of particular interest are novel methods that address robustness issues, e.g., confounding and novel distributed computation approaches. Estimating Differential Latent Variable Graphical Models with Applications to Brain Connectivity Generative Models for Biological ImagesData in scientific and commercial disciplines are increasingly characterized by high dimensions and relatively few samples. For such cases, apriori knowledge gleaned from experts and experimental evidence is invaluable for recovering meaningful models. Generative models are ideal for such knowledge-driven low-data settings. We are developing various generative models for biological imaging data and exploring novel applications of these models. We are also developing novel variational inference techniques that lead to the scalable and accurate inference, particularly for high-dimensional structured problems. A generative modeling approach for interpreting population-level variability in brain structure Learning with Aggregated DataExisting work in spatiotemporal data analysis often assumes that data are available as individual measurements. However, for reasons of privacy or storage, data is usually only available as aggregates. Data aggregation presents severe mathematical challenges to learning and inference, and a naive application of standard techniques is susceptible to the ecological fallacy. We have shown that aggregation has only a mild effect on model estimates in some cases. We are developing various tools for other cases that enable provably accurate predictive modeling with aggregated data while avoiding unnecessary and error-prone data reconstruction. Aggregation for Sensitive Data Interpretable Machine LearningAs machine learning methods have become ubiquitous in human decision-making, their transparency and interpretability have become important. Interpretability is particularly important in domains where decisions can have significant consequences. Examples abound where interpretable models can reveal important but surprising patterns in the data that complex models obscure. We are currently studying exemplar-based interpretable modeling. This is motivated by studies of human reasoning, which suggest that using examples (prototypes) is fundamental to developing effective strategies for tactical decision-making. We are also exploring the application of structured sparsity and attention (with deep neural networks) to enable interpretability. Interpreting black box predictions using Fisher kernels Funding: We graciously acknowledge generous funding support from the National Science Foundation, National Institutes of Health, Google AI, DARPA, Jump ARCHES, Discovery Partners Institute, Digital Transformation Institute, CCBGM, Onmilife, and the Mayo Clinic & Illinois Alliance. Our research is also supported by generous computing support from Microsoft Azure, Intel AI, Amazon Web Services, Google Cloud, and NCSA Bluewaters. We have also received funding from the Olga G. Nalbandov Lecture Fund, the MacArthur Foundation, and the Rockefeller Foundation to support our outreach efforts. |