Konstantin Mishchenko

Yurii E. Nesterov

SIAM J. Optim., March, 2024

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference.

[BibT_eX]

[DOI]

Stylianos I. Venieris

Hongxiang Fan

CoRR, 2024

The Road Less Scheduled.

[BibT_eX]

[DOI]

Xingyu Yang

Harsh Mehta

Ashok Cutkosky

CoRR, 2024

Prodigy: An Expeditiously Adaptive Parameter-Free Learner.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Regularized Newton Method with Global \({\boldsymbol{\mathcal{O}(1/{k}^2)}}\) Convergence.

[BibT_eX]

[DOI]

SIAM J. Optim., September, 2023

Stochastic distributed learning with gradient quantization and double-variance reduction.

[BibT_eX]

[DOI]

Samuel Horváth

Sebastian U. Stich

Optim. Methods Softw., January, 2023

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement.

[BibT_eX]

[DOI]

Ashok Cutkosky

Harsh Mehta

CoRR, 2023

Adaptive Proximal Gradient Method for Convex Optimization.

[BibT_eX]

[DOI]

Yura Malitsky

CoRR, 2023

Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity.

[BibT_eX]

[DOI]

Rustem Islamov

Eduard Gorbunov

Samuel Horváth

CoRR, 2023

Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes.

[BibT_eX]

[DOI]

Slavomír Hanzely

CoRR, 2023

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method.

[BibT_eX]

[DOI]

Chi Jin

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy.

[BibT_eX]

[DOI]

Blake E. Woodworth

Francis R. Bach

Proceedings of the International Conference on Machine Learning, 2023

Learning-Rate-Free Learning by D-Adaptation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization.

[BibT_eX]

[DOI]

Grigory Malinovsky

Proceedings of the 4th International Workshop on Distributed Machine Learning, 2023

2022

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms.

[BibT_eX]

[DOI]

Adil Salim

Laurent Condat

J. Optim. Theory Appl., 2022

Adaptive Learning Rates for Faster Stochastic Gradient Methods.

[BibT_eX]

[DOI]

Samuel Horváth

CoRR, 2022

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays.

[BibT_eX]

[DOI]

Francis R. Bach

Mathieu Even

Blake E. Woodworth

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

[BibT_eX]

[DOI]

Grigory Malinovsky

Sebastian U. Stich

Proceedings of the International Conference on Machine Learning, 2022

Proximal and Federated Random Reshuffling.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

IntSGD: Adaptive Floatless Compression of Stochastic Gradients.

[BibT_eX]

[DOI]

Bokun Wang

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Regularized Newton Method with Global O(1/k<sup>2</sup>) Convergence.

[BibT_eX]

[DOI]

CoRR, 2021

IntSGD: Floatless Compression of Stochastic Gradients.

[BibT_eX]

[DOI]

Bokun Wang

CoRR, 2021

2020

A Distributed Flexible Delay-Tolerant Proximal Gradient Algorithm.

[BibT_eX]

[DOI]

Franck Iutzeler

Jérôme Malick

SIAM J. Optim., 2020

Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms.

[BibT_eX]

[DOI]

Adil Salim

Laurent Condat

CoRR, 2020

99% of Worker-Master Communication in Distributed Optimization Is Not Needed.

[BibT_eX]

[DOI]

Filip Hanzely

Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, 2020

Random Reshuffling: Simple Analysis with Vast Improvements.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Adaptive Gradient Descent without Descent.

[BibT_eX]

[DOI]

Yura Malitsky

Proceedings of the 37th International Conference on Machine Learning, 2020

DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate.

[BibT_eX]

[DOI]

Saeed Soori

Aryan Mokhtari

Maryam Mehri Dehnavi

Mert Gürbüzbalaban

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Revisiting Stochastic Extragradient.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Tighter Theory for Local SGD on Identical and Heterogeneous Data.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019

Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates.

[BibT_eX]

[DOI]

CoRR, 2019

Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent.

[BibT_eX]

[DOI]

CoRR, 2019

Better Communication Complexity for Local SGD.

[BibT_eX]

[DOI]

CoRR, 2019

First Analysis of Local GD on Heterogeneous Data.

[BibT_eX]

[DOI]

CoRR, 2019

A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls.

[BibT_eX]

[DOI]

Mallory Montgomery

Federico Vaggi

CoRR, 2019

99% of Parallel Optimization is Inevitably a Waste of Time.

[BibT_eX]

[DOI]

Filip Hanzely

CoRR, 2019

Distributed Learning with Compressed Gradient Differences.

[BibT_eX]

[DOI]

Eduard Gorbunov

Martin Takác

CoRR, 2019

2018

SEGA: Variance Reduction via Gradient Sketching.

[BibT_eX]

[DOI]

Filip Hanzely

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning.

[BibT_eX]

[DOI]