Aaron Mishkin

Preprints

  1. Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation. A. Mishkin, M. Pilanci, M. Schmidt. 2024 [arXiv]

  2. Directional Smoothness and Gradient Methods: Convergence and Adaptivity. A. Mishkin*, A. Khaled*, Y. Wang, A. Defazio, R. M. Gower. 2024 [arXiv]

  3. Level Set Teleportation: An Optimization Perspective. A. Mishkin, A. Bietti, R. M. Gower. 2024 [arXiv]

  4. A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features. E. Zegger, Y. Wang, A. Mishkin, T. Ergen, E. Candes, M. Pilanci. 2024 [arXiv]

  5. Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm. A. V. Ramesh, A. Mishkin, M. Schmidt, Y. Zhou, J. Lavington, J. She. 2023 [arXiv]

Full Papers

  1. Optimal Sets and Solution Paths of ReLU Networks. A. Mishkin, M. Pilanci. ICML 2023. [arXiv] [code] [video]

  2. Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions. A. Mishkin, A. Sahiner, M. Pilanci. ICML 2022. [arXiv] [code] [video]

  3. Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates. S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel, S. Lacoste-Julien. NeurIPS 2019. [arXiv] [code] [video]

  4. SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient. A. Mishkin, F. Kunstner, D. Nielsen, M. Schmidt, M. E. Khan. NeurIPS 2018. [arXiv] [code] [video]

Workshop Papers

  1. A Novel Analysis of Gradient Descent under Directional Smoothness. A. Mishkin*, A. Khaled*, A. Defazio, R. M. Gower. OPT2023. [pdf]

  2. Level Set Teleportation: the Good, the Bad, and the Ugly. A. Mishkin, A. Bietti, R. M. Gower. OPT2023. [pdf]

  3. The Solution Path of the Group Lasso. A. Mishkin, M. Pilanci. OPT2022. [pdf]

  4. Fast Convergence of Greedy 2-Coordinate Updates for Optimizing with an Equality Constraint. A. V. Ramesh, A. Mishkin, M. Schmidt. OPT2022. [pdf]

  5. How to Make Your Optimizer Generalize Better. S. Vaswani, R. Babanezhad, J. Gallego, A. Mishkin, S. Lacoste-Julien, N. Le Roux. OPT2020: 12th Annual Workshop on Optimization for Machine Learning, 2020. [arXiv] [workshop]

  6. Web ValueCharts: Analyzing Individual and Group Preferences with Interactive, Web-based Visualizations. A. Mishkin. Review of Undergraduate Computer Science, 2018. [website]

Notes

  1. Strong Duality via Convex Conjugacy. [pdf]

    • This note establishes strong duality and sufficiency of Slater's constraint qualification, using only convex conjugacy and the convex closures of perturbation functions.
  2. Computing Projection Operators using Lagrangian Duality. [pdf]

    • A tutorial-style note on computing projecting operators using duality. This was originally written for EE 365B (Convex Optimization II) at Stanford University.

Theses

  1. Interpolation, Growth Conditions, and Stochastic Gradient Descent. A. Mishkin. MSc Thesis, 2020. [pdf] [slides]

Talks

Invited Talks

Talks about Painless SGD:

Talks at the UBC Machine Learning Reading Group (MLRG):

Miscellaneous: