Hadi Daneshmand

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Transformers learn to implement preconditioned gradient descent for in-context learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On Bridging the Gap between Mean Field and Finite Width Deep Random Multilayer Perceptron with Batch Normalization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Efficient displacement convex optimization with particle gradient descent.

[BibT_eX]

[DOI]

Jason D. Lee

Chi Jin

Proceedings of the International Conference on Machine Learning, 2023

2022

Entropy Maximization with Depth: A Variational Principle for Random Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2022

Polynomial-time sparse measure recovery.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Rethinking the Variational Interpretation of Accelerated Optimization Methods.

[BibT_eX]

[DOI]

Peiyuan Zhang

Antonio Orvieto

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Batch Normalization Orthogonalizes Representations in Deep Random Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

Optimization for Neural Networks: Quest for Theoretical Understandings.

[BibT_eX]

[DOI]

PhD thesis, 2020

Theoretical Understanding of Batch-normalization: A Markov Chain Perspective.

[BibT_eX]

[DOI]

CoRR, 2020

Batch normalization provably avoids ranks collapse for randomly initialised deep networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

Mixing of Stochastic Accelerated Gradient Descent.

[BibT_eX]

[DOI]

Peiyuan Zhang

CoRR, 2019

Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Local Saddle Point Optimization: A Curvature Exploitation Approach.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Towards a Theoretical Understanding of Batch Normalization.

[BibT_eX]

[DOI]

CoRR, 2018

Escaping Saddles with Stochastic Gradients.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

2017

Accelerated Dual Learning by Homotopic Initialization.

[BibT_eX]

[DOI]

Hamed Hassani

CoRR, 2017

2016

Estimating Diffusion Networks: Recovery Conditions, Sample Complexity and Soft-thresholding Algorithm.

[BibT_eX]

[DOI]

Manuel Gomez-Rodriguez

Le Song

Bernhard Schölkopf

J. Mach. Learn. Res., 2016

DynaNewton - Accelerating Newton's Method for Machine Learning.

[BibT_eX]

[DOI]

Aurélien Lucchi

CoRR, 2016

Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Starting Small - Learning with Adaptive Sample Sizes.

[BibT_eX]

[DOI]

Aurélien Lucchi

Proceedings of the 33nd International Conference on Machine Learning, 2016

2014

Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm.

[BibT_eX]

[DOI]