Atish Agarwala

According to our database¹, Atish Agarwala authored at least 19 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Per-example gradients: a new frontier for understanding and improving optimizers.

[BibT_eX]

[DOI]

Vincent Roulet

Atish Agarwala

CoRR, October, 2025

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks.

[BibT_eX]

[DOI]

CoRR, July, 2025

Avoiding spurious sharpness minimization broadens applicability of SAM.

[BibT_eX]

[DOI]

CoRR, February, 2025

How far away are truly hyperparameter-free learning algorithms?

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Feature learning as alignment: a structural property of gradient descent in non-linear neural networks.

[BibT_eX]

[DOI]

Daniel Beaglehole

Ioannis Mitliagkas

Atish Agarwala

Trans. Mach. Learn. Res., 2024

Exact Risk Curves of signSGD in High-Dimensions: Quantifying Preconditioning and Noise-Compression Effects.

[BibT_eX]

[DOI]

CoRR, 2024

A Clipped Trip: the Dynamics of SGD with Gradient Clipping in High-Dimensions.

[BibT_eX]

[DOI]

CoRR, 2024

High dimensional analysis reveals conservative sharpening and a stochastic edge of stability.

[BibT_eX]

[DOI]

Atish Agarwala

Jeffrey Pennington

CoRR, 2024

Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks.

[BibT_eX]

[DOI]

Daniel Beaglehole

Ioannis Mitliagkas

Atish Agarwala

CoRR, 2024

Stepping on the Edge: Curvature Aware Learning Rate Tuners.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Neglected Hessian component explains mysteries in sharpness regularization.

[BibT_eX]

[DOI]

Yann N. Dauphin

Atish Agarwala

Hossein Mobahi

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

Temperature check: theory and practice for training models with softmax-cross-entropy losses.

[BibT_eX]

[DOI]

Atish Agarwala

Samuel Stern Schoenholz

Jeffrey Pennington

Yann N. Dauphin

Trans. Mach. Learn. Res., 2023

On the Interplay Between Stepsize Tuning and Progressive Sharpening.

[BibT_eX]

[DOI]

Vincent Roulet

Atish Agarwala

Fabian Pedregosa

CoRR, 2023

Second-order regression models exhibit progressive sharpening to the edge of stability.

[BibT_eX]

[DOI]

Atish Agarwala

Fabian Pedregosa

Jeffrey Pennington

Proceedings of the International Conference on Machine Learning, 2023

SAM operates far from home: eigenvalue regularization as a dynamical phenomenon.

[BibT_eX]

[DOI]

Atish Agarwala

Yann N. Dauphin

Proceedings of the International Conference on Machine Learning, 2023

2022

Deep equilibrium networks are sensitive to initialization statistics.

[BibT_eX]

[DOI]

Atish Agarwala

Samuel S. Schoenholz

Proceedings of the International Conference on Machine Learning, 2022

2021

One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

Learning the gravitational force law and other analytic functions.

[BibT_eX]

[DOI]

CoRR, 2020

Atish Agarwala

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...