Preprints

Glocal Smoothness: Line Search can really help! C. Fox, A. Mishkin, S. Vaswani, M. Schmidt. 2025 [arXiv]
(Bugged!) Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation. A. Mishkin, M. Pilanci, M. Schmidt. 2024 [arXiv]
A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features. E. Zegger, Y. Wang, A. Mishkin, T. Ergen, E. Candes, M. Pilanci. 2024 [arXiv]
Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm. A. V. Ramesh, A. Mishkin, M. Schmidt, Y. Zhou, J. Lavington, J. She. 2023 [arXiv]

Full Papers

Level Set Teleportation: An Optimization Perspective. A. Mishkin, A. Bietti, R. M. Gower. AISTATS 2025 [arXiv]
(Oral!) Exploring the loss landscape of regularized neural networks via convex duality. S. Kim, A. Mishkin, M. Pilanci. ICLR 2025 [arXiv]
Directional Smoothness and Gradient Methods: Convergence and Adaptivity. A. Mishkin*, A. Khaled*, Y. Wang, A. Defazio, R. M. Gower. NeurIPS 2024 [arXiv] [code] [poster]
Optimal Sets and Solution Paths of ReLU Networks. A. Mishkin, M. Pilanci. ICML 2023. [arXiv] [code] [video]
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions. A. Mishkin, A. Sahiner, M. Pilanci. ICML 2022. [arXiv] [code] [video]
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates. S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel, S. Lacoste-Julien. NeurIPS 2019. [arXiv] [code] [video]
SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient. A. Mishkin, F. Kunstner, D. Nielsen, M. Schmidt, M. E. Khan. NeurIPS 2018. [arXiv] [code] [video]

A Novel Analysis of Gradient Descent under Directional Smoothness. A. Mishkin*, A. Khaled*, A. Defazio, R. M. Gower. OPT2023. [pdf]
Level Set Teleportation: the Good, the Bad, and the Ugly. A. Mishkin, A. Bietti, R. M. Gower. OPT2023. [pdf]
The Solution Path of the Group Lasso. A. Mishkin, M. Pilanci. OPT2022. [pdf]
Fast Convergence of Greedy 2-Coordinate Updates for Optimizing with an Equality Constraint. A. V. Ramesh, A. Mishkin, M. Schmidt. OPT2022. [pdf]
How to Make Your Optimizer Generalize Better. S. Vaswani, R. Babanezhad, J. Gallego, A. Mishkin, S. Lacoste-Julien, N. Le Roux. OPT2020: 12th Annual Workshop on Optimization for Machine Learning, 2020. [arXiv] [workshop]
Web ValueCharts: Analyzing Individual and Group Preferences with Interactive, Web-based Visualizations. A. Mishkin. Review of Undergraduate Computer Science, 2018. [website]

This note establishes strong duality and sufficiency of Slater's constraint qualification, using only convex conjugacy and the convex closures of perturbation functions.

A tutorial-style note on computing projecting operators using duality. This was originally written for EE 365B (Convex Optimization II) at Stanford University.

Convex Analysis of Non-Convex Neural Networks. A. Mishkin. PhD Thesis, 2025. [slides src] [slides pdf]
Interpolation, Growth Conditions, and Stochastic Gradient Descent. A. Mishkin. MSc Thesis, 2020. [pdf] [slides]

Optimal Sets and Solution Paths of ReLU Networks: Talk at SIAM NCC25. October, 2025. [slides]
Level Set Teleportation: An Optimization Perspective: Talk at Montreal MLOpt Seminar. January, 2025. [slides]
Optimal Sets and Solution Paths of ReLU Networks: Talk at EPFL MLO Group. November, 2024. [slides]
Level Set Teleportation: An Optimization Perspective: Talk at MPI Intelligent Systems/ELLIS Institute. August 2024. [slides]
Optimal Sets and Solution Paths of ReLU Networks: Talk at Math Machine Learning Seminar MPI MIS + UCLA. January 2024. [slides] [video]
SGD Under Interpolation: Convergence, Line-search, and Acceleration: Talk at SIAM OP23 Minisymposium on Adaptivity in Stochastic Optimization. June 2023. [slides]
Fast Convex Optimization for Two-Layer ReLU Networks: Talk at UBC Institute for Applied Mathematics. July 2022. [slides]
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates: Talk at MLSS 2020. [slides] [video] [src]

Instrumental Variables, DeepIV, and Forbidden Regressions: learning to evaluate counterfactuals via instrumental variables. Talk for MLRG 2019W2. [slides] [src]
Why Does Deep Learning Work? An intuitive outline of the role "implicit regularization" plays in deep neural networks. Introduction talk for MLRG 2019W1. [slides] [src]
Generative Adversarial Networks: an intro from the perspective of GANs as probabilistic models with intractible density functions. Talk for MLRG 2018W2. [slides] [src]
Standard and Natural Policy Gradients for Discounted Rewards: an intro to policy-gradient algorithms. Talk for MLRG 2018W1. [slides] [src]

Better Optimization via Interpolation: slides for an entrance interview at CalTech on interpolation and the Armijo line-search. [slides]
Painless SGD: A longer version of the same talk for a research exchange with the PLAI lab. [slides] [src]
CUCSC 2017: Web ValueCharts: Exploring Individual and Group Preferences Through Interactive Web-based Visualizations.
MURC 2017: Web ValueCharts: Supporting Decision Makers with Interactive, Web-Based Visualizations. [slides]