Libin Zhu

Xue Wang

IEEE Trans. Geosci. Remote. Sens., 2025

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product.

[BibT_eX]

[DOI]

Neil Mallinar

Daniel Beaglehole

Parthe Pandit

Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024

Toward Understanding the Dynamics of Over-parameterized Neural Networks

[BibT_eX]

[DOI]

PhD thesis, 2024

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Quadratic models for understanding catapult dynamics of neural networks.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Neural tangent kernel at initialization: linear width suffices.

[BibT_eX]

[DOI]

Arindam Banerjee

Pedro Cisneros-Velarde

Proceedings of the Uncertainty in Artificial Intelligence, 2023

Restricted Strong Convexity of Deep Learning Models with Smooth Activations.

[BibT_eX]

[DOI]

Arindam Banerjee

Pedro Cisneros-Velarde

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

A note on Linear Bottleneck networks and their Transition to Multilinearity.

[BibT_eX]

[DOI]

Parthe Pandit

CoRR, 2022

Quadratic models for understanding neural network dynamics.

[BibT_eX]

[DOI]

CoRR, 2022

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2020

Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning.

[BibT_eX]

[DOI]

CoRR, 2020

On the linearity of large non-linear models: when and why the tangent kernel is constant.

[BibT_eX]

[DOI]