Chulhee Yun

CoRR, March, 2026

Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness.

[BibT_eX]

[DOI]

Baekrok Shin

CoRR, March, 2026

Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?

[BibT_eX]

[DOI]

Jihwan Kim

Dogyoon Song

CoRR, March, 2026

Regularized Online RLHF with Generalized Bilinear Preferences.

[BibT_eX]

[DOI]

CoRR, February, 2026

Uniform Spectral Growth and Convergence of Muon in LoRA-Style Matrix Factorization.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime.

[BibT_eX]

[DOI]

Beomhan Baek

CoRR, October, 2025

Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning.

[BibT_eX]

[DOI]

Junsoo Oh

Jerry Song

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets.

[BibT_eX]

[DOI]

Yujun Kim

Chaewon Moon

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More.

[BibT_eX]

[DOI]

Geonhui Yoo

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems.

[BibT_eX]

[DOI]

Yujun Kim

Jaeyoung Cha

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent.

[BibT_eX]

[DOI]

Donghwa Kim

Jaewook Lee

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Does SGD really happen in tiny subspaces?

[BibT_eX]

[DOI]

Kwangjun Ahn

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification.

[BibT_eX]

[DOI]

Hyunji Jung

Hanseul Cho

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Provable Benefit of Cutout and CutMix for Feature Learning.

[BibT_eX]

[DOI]

Junsoo Oh

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements.

[BibT_eX]

[DOI]

Jiseok Chae

Donghwan Kim

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Fundamental Benefit of Alternating Updates in Minimax Optimization.

[BibT_eX]

[DOI]

Jaewook Lee

Hanseul Cho

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Linear attention is (maybe) all you need (to understand Transformer optimization).

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study.

[BibT_eX]

[DOI]

CoRR, 2023

Enhancing Generalization and Plasticity for Sample Efficient Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima.

[BibT_eX]

[DOI]

Dongkuk Si

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On the Training Instability of Shuffling SGD with Batch Normalization.

[BibT_eX]

[DOI]

David Xing Wu

Proceedings of the International Conference on Machine Learning, 2023

Provable Benefit of Mixup for Finding Optimal Decision Boundaries.

[BibT_eX]

[DOI]

Junsoo Oh

Proceedings of the International Conference on Machine Learning, 2023

Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond.

[BibT_eX]

[DOI]

Jaeyoung Cha

Jaewook Lee

Proceedings of the International Conference on Machine Learning, 2023

SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization.

[BibT_eX]

[DOI]

Hanseul Cho

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond.

[BibT_eX]

[DOI]

Shashank Rajput

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Optimization for Deep Learning: Bridging the Theory-Practice Gap.

[BibT_eX]

[DOI]

PhD thesis, 2021

Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?

[BibT_eX]

[DOI]

CoRR, 2021

A unifying view on implicit bias in training linear neural networks.

[BibT_eX]

[DOI]

Shankar Krishnan

Hossein Mobahi

Proceedings of the 9th International Conference on Learning Representations, 2021

Minimum Width for Universal Approximation.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Open Problem: Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

Provable Memorization via Deep Neural Networks using Sub-linear Parameters.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

2020

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

SGD with shuffling: optimal rates without component convexity and large epoch requirements.

[BibT_eX]

[DOI]

Kwangjun Ahn

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Low-Rank Bottleneck in Multi-head Attention Models.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Are Transformers universal approximators of sequence-to-sequence functions?

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Are deep ResNets provably better than linear predictors?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Small nonlinearities in activation functions create bad local minima in neural networks.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Efficiently testing local optimality and escaping saddles for ReLU networks.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

2018

Finite sample expressive power of small-width ReLU networks.

[BibT_eX]

[DOI]

CoRR, 2018

A Critical View of Global Optimality in Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Global Optimality Conditions for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

Minimax Bounds on Stochastic Batched Convex Optimization.

[BibT_eX]

[DOI]

John C. Duchi

Feng Ruan

Proceedings of the Conference On Learning Theory, 2018

2015

Face detection using Local Hybrid Patterns.

[BibT_eX]

[DOI]

Donghoon Lee

Chang Dong Yoo

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2013

An implementation of computer vision technique for an edutainment robot with a visual programming language.

[BibT_eX]

[DOI]

Jaegon Ahn

Yeon-Ho Kim

Proceedings of the 10th International Conference on Ubiquitous Robots and Ambient Intelligence, 2013

A fusion of computer vision technique and a visual programming language for edutainment robots.

[BibT_eX]

[DOI]