Antonio Orvieto

CoRR, August, 2025

Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size.

[BibT_eX]

[DOI]

CoRR, August, 2025

GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities.

[BibT_eX]

[DOI]

Samira Ebrahimi Kahou

Massimo Caccia

CoRR, July, 2025

Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?

[BibT_eX]

[DOI]

CoRR, July, 2025

(Almost) Free Modality Stitching of Foundation Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

Generalized Linear Mode Connectivity for Transformers.

[BibT_eX]

[DOI]

CoRR, June, 2025

Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling.

[BibT_eX]

[DOI]

Teodora Sreckovic

Jonas Geiping

CoRR, June, 2025

On the Interaction of Noise, Compression Role, and Adaptivity under (L<sub>0</sub>, L<sub>1</sub>)-Smoothness: An SDE-based Approach.

[BibT_eX]

[DOI]

Rustem Islamov

Eduard Gorbunov

CoRR, June, 2025

In Search of Adam's Secret Sauce.

[BibT_eX]

[DOI]

Robert Gower

CoRR, May, 2025

Can you Finetune your Binoculars? Embedding Text Watermarks into the Weights of Large Language Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

An accelerated lyapunov function for Polyak's Heavy-ball on convex quadratics.

[BibT_eX]

[DOI]

Optim. Lett., March, 2025

Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations.

[BibT_eX]

[DOI]

CoRR, March, 2025

Generalized Interpolating Discrete Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

When, Where and Why to Average Weights?

[BibT_eX]

[DOI]

Niccolò Ajroldi

Jonas Geiping

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture.

[BibT_eX]

[DOI]

Sajad Movahedi

Seyed-Mohsen Moosavi-Dezfooli

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

An uncertainty principle for Linear Recurrent Neural Networks.

[BibT_eX]

[DOI]

Alexandre François

Francis R. Bach

Proceedings of the Thirty Eighth Annual Conference on Learning Theory, 2025

2024

NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs.

[BibT_eX]

[DOI]

Nursena Köprücü

Destiny Okpekpe

CoRR, 2024

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes.

[BibT_eX]

[DOI]

Lin Xiao

CoRR, 2024

Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes.

[BibT_eX]

[DOI]

CoRR, 2024

On the low-shot transferability of [V]-Mamba.

[BibT_eX]

[DOI]

Diganta Misra

Jay Gala

CoRR, 2024

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Recurrent neural networks: vanishing and exploding gradients are not the end of the story.

[BibT_eX]

[DOI]

Nicolas Zucchet

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Understanding the Differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Super Consistency of Neural Network Landscapes and Learning Rate Transfer.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Loss Landscape Characterization of Neural Networks without Over-Parametrization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Theoretical Foundations of Deep Selective State-Space Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Recurrent Distance Filtering for Graph Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SDEs for Minimax Optimization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

Recurrent Distance-Encoding Neural Networks for Graph Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2023

On the Universality of Linear Recurrences Followed by Nonlinear Projections.

[BibT_eX]

[DOI]

CoRR, 2023

On the effectiveness of Randomized Signatures as Reservoir for Learning Rough Dynamics.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2023

Resurrecting Recurrent Neural Networks for Long Sequences.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

An SDE for Modeling SAM: Theory and Insights.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Explicit Regularization in Overparametrized Models via Noise Injection.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022

Randomized Signature Layers for Signal Extraction in Time Series Data.

[BibT_eX]

[DOI]

CoRR, 2022

Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution.

[BibT_eX]

[DOI]

Simon Lacoste-Julien

Nicolas Loizou

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On the Theoretical Properties of Noise Correlation in Stochastic Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Anticorrelated Noise Injection for Improved Generalization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Vanishing Curvature in Randomly Initialized Deep ReLU Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks.

[BibT_eX]

[DOI]

CoRR, 2021

Rethinking the Variational Interpretation of Accelerated Optimization Methods.

[BibT_eX]

[DOI]

Peiyuan Zhang

Hadi Daneshmand

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Second-order Convergence Properties of Random Search Methods.

[BibT_eX]

[DOI]

Giambattista Parascandolo

Adamos Solomou

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning explanations that are hard to vary.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Momentum Improves Optimization on Riemannian Manifolds.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

Two-Level K-FAC Preconditioning for Deep Learning.

[BibT_eX]

[DOI]

Nikolaos Tselepidis

Jonas Kohler

CoRR, 2020

An Accelerated DFO Algorithm for Finite-sum Convex Functions.

[BibT_eX]

[DOI]

Yuwen Chen

Proceedings of the 37th International Conference on Machine Learning, 2020

A Continuous-time Perspective for Modeling Acceleration in Riemannian Optimization.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019

The Role of Memory in Stochastic Optimization.

[BibT_eX]

[DOI]

Jonas Kohler

Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

Shadowing Properties of Optimization Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Continuous-time Models for Stochastic Optimization Algorithms.

[BibT_eX]

[DOI]