Kwangjun Ahn

Orcid: 0000-0001-5516-5775

According to our database¹, Kwangjun Ahn authored at least 41 papers between 2016 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Dion2: A Simple Method to Shrink Matrix in Muon.

[BibT_eX]

[DOI]

Kwangjun Ahn

Noah Amsel

John Langford

CoRR, December, 2025

Next-Latent Prediction Transformers Learn Compact World Models.

[BibT_eX]

[DOI]

CoRR, November, 2025

Dion: A Communication-Efficient Optimizer for Large Models.

[BibT_eX]

[DOI]

Kwangjun Ahn

Byron Xu

CoRR, April, 2025

Efficient Joint Prediction of Multiple Future Tokens.

[BibT_eX]

[DOI]

Kwangjun Ahn

Alex Lamb

John Langford

CoRR, March, 2025

Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization.

[BibT_eX]

[DOI]

Kwangjun Ahn

Gagik Magakyan

Ashok Cutkosky

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Does SGD really happen in tiny subspaces?

[BibT_eX]

[DOI]

Minhak Song

Kwangjun Ahn

Chulhee Yun

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

The Belief State Transformer.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Learning to Achieve Goals with Belief State Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Improved Sample Complexity of Imitation Learning for Barrier Model Predictive Control.

[BibT_eX]

[DOI]

CoRR, 2024

Adam with model exponential moving average is effective for nonconvex optimization.

[BibT_eX]

[DOI]

Kwangjun Ahn

Ashok Cutkosky

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

How to Escape Sharp Minima with Random Perturbations.

[BibT_eX]

[DOI]

Kwangjun Ahn

Ali Jadbabaie

Suvrit Sra

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Linear attention is (maybe) all you need (to understand Transformer optimization).

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

On the Sample Complexity of Imitation Learning for Smoothed Model Predictive Control.

[BibT_eX]

[DOI]

Proceedings of the 63rd IEEE Conference on Decision and Control, 2024

2023

A Unified Approach to Controlling Implicit Regularization via Mirror Descent.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

Smooth Model Predictive Control with Applications to Statistical Learning.

[BibT_eX]

[DOI]

CoRR, 2023

How to escape sharp minima.

[BibT_eX]

[DOI]

Kwangjun Ahn

Ali Jadbabaie

Suvrit Sra

CoRR, 2023

Transformers learn to implement preconditioned gradient descent for in-context learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning threshold neurons via edge of stability.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Crucial Role of Normalization in Sharpness-Aware Minimization.

[BibT_eX]

[DOI]

Yan Dai

Kwangjun Ahn

Suvrit Sra

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Model Predictive Control via On-Policy Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the Learning for Dynamics and Control Conference, 2023

2022

Understanding Nesterov's Acceleration via Proximal Point Method.

[BibT_eX]

[DOI]

Kwangjun Ahn

Suvrit Sra

Proceedings of the 5th Symposium on Simplicity in Algorithms, 2022

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently.

[BibT_eX]

[DOI]

Haoyuan Sun

Kwangjun Ahn

Christos Thrampoulidis

Navid Azizan

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Reproducibility in Optimization: Theoretical Framework and Limits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Agnostic Learnability of Halfspaces via Logistic Loss.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Understanding the unstable convergence of gradient descent.

[BibT_eX]

[DOI]

Kwangjun Ahn

Jingzhao Zhang

Suvrit Sra

Proceedings of the International Conference on Machine Learning, 2022

One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares.

[BibT_eX]

[DOI]

Youngjae Min

Kwangjun Ahn

Navid Azizan

Proceedings of the 61st IEEE Conference on Decision and Control, 2022

2021

Riemannian Perspective on Matrix Factorization.

[BibT_eX]

[DOI]

Kwangjun Ahn

Felipe Suarez

CoRR, 2021

Efficient constrained sampling via the mirror-Langevin algorithm.

[BibT_eX]

[DOI]

Kwangjun Ahn

Sinho Chewi

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Optimal dimension dependence of the Metropolis-Adjusted Langevin Algorithm.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

2020

From Proximal Point Method to Nesterov's Acceleration.

[BibT_eX]

[DOI]

Kwangjun Ahn

CoRR, 2020

On Tight Convergence Rates of Without-replacement SGD.

[BibT_eX]

[DOI]

Kwangjun Ahn

Suvrit Sra

CoRR, 2020

SGD with shuffling: optimal rates without component convexity and large epoch requirements.

[BibT_eX]

[DOI]

Kwangjun Ahn

Chulhee Yun

Suvrit Sra

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

From Nesterov's Estimate Sequence to Riemannian Acceleration.

[BibT_eX]

[DOI]

Kwangjun Ahn

Suvrit Sra

Proceedings of the Conference on Learning Theory, 2020

A Simpler Strong Refutation of Random k-XOR.

[BibT_eX]

[DOI]

Kwangjun Ahn

Proceedings of the Approximation, 2020

2018

Hypergraph Spectral Clustering in the Weighted Stochastic Block Model.

[BibT_eX]

[DOI]

Kwangjun Ahn

Kangwook Lee

Changho Suh

IEEE J. Sel. Top. Signal Process., 2018

Binary Rating Estimation with Graph Side Information.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

Computing the maximum matching width is NP-hard.

[BibT_eX]

[DOI]

Kwangjun Ahn

Jisu Jeong

CoRR, 2017

Information-theoretic limits of subspace clustering.

[BibT_eX]

[DOI]

Kwangjun Ahn

Kangwook Lee

Changho Suh

Proceedings of the 2017 IEEE International Symposium on Information Theory, 2017

2016

Community recovery in hypergraphs.

[BibT_eX]

[DOI]

Kwangjun Ahn

Kangwook Lee

Changho Suh

Proceedings of the 54th Annual Allerton Conference on Communication, 2016

Kwangjun Ahn

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...