We stand with Ukraine

We stand with Ukraine

Aaron Defazio

Orcid: 0000-0002-8764-3986

According to our database¹, Aaron Defazio authored at least 42 papers between 2012 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models.

[DOI]

CoRR, May, 2026

2025

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs.

[DOI]

,

Konstantin Mishchenko

,

Parameswaran Raman

,

Hao-Jun Michael Shi

,

CoRR, December, 2025

Why Gradients Rapidly Increase Near the End of Training.

[DOI]

CoRR, June, 2025

Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner.

[DOI]

Runa Eschenhagen

,

,

Tsung-Hsien Lee

,

Richard E. Turner

,

Hao-Jun Michael Shi

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

PARQ: Piecewise-Affine Regularized Quantization.

[DOI]

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024

Directional Smoothness and Gradient Methods: Convergence and Adaptivity.

[DOI]

,

,

,

,

Robert M. Gower

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

The Road Less Scheduled.

[DOI]

,

,

,

Konstantin Mishchenko

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MoMo: Momentum Models for Adaptive Learning Rates.

[DOI]

,

,

Michael Eickenberg

,

,

Robert M. Gower

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Prodigy: An Expeditiously Adaptive Parameter-Free Learner.

[DOI]

Konstantin Mishchenko

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement.

[DOI]

,

,

,

Konstantin Mishchenko

CoRR, 2023

Mechanic: A Learning Rate Tuner.

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning-Rate-Free Learning by D-Adaptation.

[DOI]

,

Konstantin Mishchenko

Proceedings of the International Conference on Machine Learning, 2023

2022

A scaling calculus for the design and initialization of ReLU networks.

[DOI]

,

Neural Comput. Appl., 2022

A Momentumized, Adaptive, Dual Averaged Gradient Method.

[DOI]

,

J. Mach. Learn. Res., 2022

Compressed sensing with a jackknife and a bootstrap.

[DOI]

,

,

,

J. Data Sci. Stat. Vis., 2022

Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method.

[DOI]

,

,

CoRR, 2022

2021

Stochastic Polyak Stepsize with a Moving Target.

[DOI]

Robert M. Gower

,

,

Michael G. Rabbat

CoRR, 2021

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization.

[DOI]

,

CoRR, 2021

Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball.

[DOI]

Othmane Sebbouh

,

Robert M. Gower

,

Proceedings of the Conference on Learning Theory, 2021

The Power of Factorial Powers: New Parameter settings for (Stochastic) Optimization.

[DOI]

,

Robert M. Gower

Proceedings of the Asian Conference on Machine Learning, 2021

2020

Dual Averaging is Surprisingly Effective for Deep Learning Optimization.

[DOI]

,

CoRR, 2020

Understanding the Role of Momentum in Non-Convex Optimization: Practical Insights from a Lyapunov Analysis.

[DOI]

CoRR, 2020

On the convergence of the Stochastic Heavy Ball Method.

[DOI]

Othmane Sebbouh

,

Robert M. Gower

,

CoRR, 2020

Factorial Powers for Stochastic Optimization.

[DOI]

,

Robert M. Gower

CoRR, 2020

Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge.

[DOI]

,

,

,

Nafissa Yakubova

,

,

Michael G. Rabbat

,

,

Matthew J. Muckley

,

Daniel K. Sodickson

,

C. Lawrence Zitnick

,

Michael P. Recht

CoRR, 2020

MRI Banding Removal via Adversarial Training.

[DOI]

,

,

Michael P. Recht

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

End-to-End Variational Networks for Accelerated MRI Reconstruction.

[DOI]

,

,

,

,

C. Lawrence Zitnick

,

Nafissa Yakubova

,

,

Patricia M. Johnson

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2020, 2020

GrappaNet: Combining Parallel Imaging With Deep Learning for Multi-Coil MRI Reconstruction.

[DOI]

,

,

,

C. Lawrence Zitnick

,

,

Daniel K. Sodickson

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Offset Masking Improves Deep Learning based Accelerated MRI Reconstructions.

[DOI]

CoRR, 2019

Scaling Laws for the Principled Design, Initialization and Preconditioning of ReLU Networks.

[DOI]

,

CoRR, 2019

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning.

[DOI]

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On the Curved Geometry of Accelerated Optimization.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018

Controlling Covariate Shift using Equilibrium Normalization of Weights.

[DOI]

,

CoRR, 2018

fastMRI: An Open Dataset and Benchmarks for Accelerated MRI.

[DOI]

CoRR, 2018

2016

A Simple Practical Accelerated Method for Finite Sums.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2015

New Optimisation Methods for Machine Learning.

[DOI]

CoRR, 2015

Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields.

[DOI]

,

Reza Babanezhad

,

Mohamed Osama Ahmed

,

,

,

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2014

A Comparison of learning algorithms on the Arcade Learning Environment.

[DOI]

,

CoRR, 2014

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives.

[DOI]

,

Francis R. Bach

,

Simon Lacoste-Julien

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Finito: A faster, permutable incremental gradient method for big data problems.

[DOI]

,

,

Tibério S. Caetano

Proceedings of the 31th International Conference on Machine Learning, 2014

2012

A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation.

[DOI]

,

Tibério S. Caetano

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

A Graphical Model Formulation of Collaborative Filtering Neighbourhood Methods with Fast Maximum Entropy Training.

[DOI]

,

Tibério S. Caetano

Proceedings of the 29th International Conference on Machine Learning, 2012

Loading...