Konstantin Mishchenko

Orcid: 0000-0002-5241-7292

According to our database1, Konstantin Mishchenko authored at least 37 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Super-Universal Regularized Newton Method.
SIAM J. Optim., March, 2024

2023
Regularized Newton Method with Global \({\boldsymbol{\mathcal{O}(1/{k}^2)}}\) Convergence.
SIAM J. Optim., September, 2023

Stochastic distributed learning with gradient quantization and double-variance reduction.
Optim. Methods Softw., January, 2023

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement.
CoRR, 2023

Adaptive Proximal Gradient Method for Convex Optimization.
CoRR, 2023

Prodigy: An Expeditiously Adaptive Parameter-Free Learner.
CoRR, 2023

Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity.
CoRR, 2023

Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes.
CoRR, 2023

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy.
Proceedings of the International Conference on Machine Learning, 2023

Learning-Rate-Free Learning by D-Adaptation.
Proceedings of the International Conference on Machine Learning, 2023

Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization.
Proceedings of the 4th International Workshop on Distributed Machine Learning, 2023

2022
Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms.
J. Optim. Theory Appl., 2022

Adaptive Learning Rates for Faster Stochastic Gradient Methods.
CoRR, 2022

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!
Proceedings of the International Conference on Machine Learning, 2022

Proximal and Federated Random Reshuffling.
Proceedings of the International Conference on Machine Learning, 2022

IntSGD: Adaptive Floatless Compression of Stochastic Gradients.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Regularized Newton Method with Global O(1/k<sup>2</sup>) Convergence.
CoRR, 2021

IntSGD: Floatless Compression of Stochastic Gradients.
CoRR, 2021

2020
A Distributed Flexible Delay-Tolerant Proximal Gradient Algorithm.
SIAM J. Optim., 2020

Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms.
CoRR, 2020

99% of Worker-Master Communication in Distributed Optimization Is Not Needed.
Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, 2020

Random Reshuffling: Simple Analysis with Vast Improvements.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Adaptive Gradient Descent without Descent.
Proceedings of the 37th International Conference on Machine Learning, 2020

DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Revisiting Stochastic Extragradient.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Tighter Theory for Local SGD on Identical and Heterogeneous Data.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates.
CoRR, 2019

Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent.
CoRR, 2019

Better Communication Complexity for Local SGD.
CoRR, 2019

First Analysis of Local GD on Heterogeneous Data.
CoRR, 2019

A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls.
CoRR, 2019

99% of Parallel Optimization is Inevitably a Waste of Time.
CoRR, 2019

Distributed Learning with Compressed Gradient Differences.
CoRR, 2019

2018
SEGA: Variance Reduction via Gradient Sketching.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018


  Loading...