Aaron Mishkin

Preprints

    1. Glocal Smoothness: Line Search can really help! C. Fox, A. Mishkin, S. Vaswani, M. Schmidt. 2025 [arXiv]

    2. (Bugged!) Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation. A. Mishkin, M. Pilanci, M. Schmidt. 2024 [arXiv]

    3. A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features. E. Zegger, Y. Wang, A. Mishkin, T. Ergen, E. Candes, M. Pilanci. 2024 [arXiv]

    4. Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm. A. V. Ramesh, A. Mishkin, M. Schmidt, Y. Zhou, J. Lavington, J. She. 2023 [arXiv]

    Full Papers

    1. Level Set Teleportation: An Optimization Perspective. A. Mishkin, A. Bietti, R. M. Gower. AISTATS 2025 [arXiv]

    2. (Oral!) Exploring the loss landscape of regularized neural networks via convex duality. S. Kim, A. Mishkin, M. Pilanci. ICLR 2025 [arXiv]

    3. Directional Smoothness and Gradient Methods: Convergence and Adaptivity. A. Mishkin*, A. Khaled*, Y. Wang, A. Defazio, R. M. Gower. NeurIPS 2024 [arXiv] [code] [poster]

    4. Optimal Sets and Solution Paths of ReLU Networks. A. Mishkin, M. Pilanci. ICML 2023. [arXiv] [code] [video]

    5. Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions. A. Mishkin, A. Sahiner, M. Pilanci. ICML 2022. [arXiv] [code] [video]

    6. Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates. S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel, S. Lacoste-Julien. NeurIPS 2019. [arXiv] [code] [video]

    7. SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient. A. Mishkin, F. Kunstner, D. Nielsen, M. Schmidt, M. E. Khan. NeurIPS 2018. [arXiv] [code] [video]

    Workshop Papers

    1. A Novel Analysis of Gradient Descent under Directional Smoothness. A. Mishkin*, A. Khaled*, A. Defazio, R. M. Gower. OPT2023. [pdf]

    2. Level Set Teleportation: the Good, the Bad, and the Ugly. A. Mishkin, A. Bietti, R. M. Gower. OPT2023. [pdf]

    3. The Solution Path of the Group Lasso. A. Mishkin, M. Pilanci. OPT2022. [pdf]

    4. Fast Convergence of Greedy 2-Coordinate Updates for Optimizing with an Equality Constraint. A. V. Ramesh, A. Mishkin, M. Schmidt. OPT2022. [pdf]

    5. How to Make Your Optimizer Generalize Better. S. Vaswani, R. Babanezhad, J. Gallego, A. Mishkin, S. Lacoste-Julien, N. Le Roux. OPT2020: 12th Annual Workshop on Optimization for Machine Learning, 2020. [arXiv] [workshop]

    6. Web ValueCharts: Analyzing Individual and Group Preferences with Interactive, Web-based Visualizations. A. Mishkin. Review of Undergraduate Computer Science, 2018. [website]

    Notes

    1. Strong Duality via Convex Conjugacy. [pdf]

      • This note establishes strong duality and sufficiency of Slater's constraint qualification, using only convex conjugacy and the convex closures of perturbation functions.
    2. Computing Projection Operators using Lagrangian Duality. [pdf]

      • A tutorial-style note on computing projecting operators using duality. This was originally written for EE 365B (Convex Optimization II) at Stanford University.

    Theses

    1. Convex Analysis of Non-Convex Neural Networks. A. Mishkin. PhD Thesis, 2025. [slides src] [slides pdf]

    2. Interpolation, Growth Conditions, and Stochastic Gradient Descent. A. Mishkin. MSc Thesis, 2020. [pdf] [slides]

    Talks

    Contributed and Invited Talks

    Talks at the UBC Machine Learning Reading Group (MLRG):

    Miscellaneous: