Yuan Cao

Difan Zou

Mach. Learn., March, 2025

On the Robustness of Transformers against Context Hijacking for Linear Classification.

[BibT_eX]

[DOI]

CoRR, February, 2025

Transformer Learns Optimal Variable Selection in Group-Sparse Classification.

[BibT_eX]

[DOI]

Chenyang Zhang

Xuran Meng

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

On the Feature Learning in Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

On the Power of Multitask Representation Learning with Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

2024

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Towards Simple and Provable Parameter-Free Adaptive Gradient Methods.

[BibT_eX]

[DOI]

CoRR, 2024

Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers.

[BibT_eX]

[DOI]

CoRR, 2024

The Implicit Bias of Adam on Separable Data.

[BibT_eX]

[DOI]

Chenyang Zhang

Difan Zou

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

One-Layer Transformer Provably Learns One-Nearest Neighbor In Context.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

On the Comparison between Multi-modal and Single-modal Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Global Convergence in Training Large-Scale Transformers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data.

[BibT_eX]

[DOI]

Xuran Meng

Difan Zou

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective.

[BibT_eX]

[DOI]

CoRR, 2023

Benign Overfitting in Adversarially Robust Linear Classification.

[BibT_eX]

[DOI]

Jinghui Chen

Proceedings of the Uncertainty in Artificial Intelligence, 2023

The Benefits of Mixup for Feature Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Understanding Train-Validation Split in Meta-Learning with Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

How Does Semi-supervised Learning with Pseudo-labelers Work? A Case Study.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022

Benign Overfitting in Two-layer Convolutional Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2022

Benign Overfitting in Two-layer Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures.

[BibT_eX]

[DOI]

Mikhail Belkin

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Understanding the Spectral Bias of Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

Gradient descent optimizes over-parameterized deep ReLU networks.

[BibT_eX]

[DOI]

Mach. Learn., 2020

Mean-Field Analysis of Two-Layer Neural Networks: Non-Asymptotic Rates and Generalization Bounds.

[BibT_eX]

[DOI]

CoRR, 2020

Agnostic Learning of a Single Neuron with Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Accelerated Factored Gradient Descent for Low-Rank Matrix Factorization.

[BibT_eX]

[DOI]

Dongruo Zhou

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks.

[BibT_eX]

[DOI]