Antonio Orvieto

Orcid: 0000-0002-1914-0367

According to our database1, Antonio Orvieto authored at least 56 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
When recalling in-context, Transformers are not SSMs.
CoRR, August, 2025

Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size.
CoRR, August, 2025

GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities.
CoRR, July, 2025

Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?
CoRR, July, 2025

(Almost) Free Modality Stitching of Foundation Models.
CoRR, July, 2025

Generalized Linear Mode Connectivity for Transformers.
CoRR, June, 2025

Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling.
CoRR, June, 2025

On the Interaction of Noise, Compression Role, and Adaptivity under (L<sub>0</sub>, L<sub>1</sub>)-Smoothness: An SDE-based Approach.
CoRR, June, 2025

In Search of Adam's Secret Sauce.
CoRR, May, 2025

Can you Finetune your Binoculars? Embedding Text Watermarks into the Weights of Large Language Models.
CoRR, April, 2025

An accelerated lyapunov function for Polyak's Heavy-ball on convex quadratics.
Optim. Lett., March, 2025

Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations.
CoRR, March, 2025

Generalized Interpolating Discrete Diffusion.
CoRR, March, 2025

When, Where and Why to Average Weights?
CoRR, February, 2025

Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

An uncertainty principle for Linear Recurrent Neural Networks.
Proceedings of the Thirty Eighth Annual Conference on Learning Theory, 2025

2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs.
CoRR, 2024

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes.
CoRR, 2024

Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes.
CoRR, 2024

On the low-shot transferability of [V]-Mamba.
CoRR, 2024

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning.
CoRR, 2024

Recurrent neural networks: vanishing and exploding gradients are not the end of the story.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Understanding the Differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Super Consistency of Neural Network Landscapes and Learning Rate Transfer.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Loss Landscape Characterization of Neural Networks without Over-Parametrization.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Theoretical Foundations of Deep Selective State-Space Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Recurrent Distance Filtering for Graph Representation Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SDEs for Minimax Optimization.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023
Recurrent Distance-Encoding Neural Networks for Graph Representation Learning.
CoRR, 2023

On the Universality of Linear Recurrences Followed by Nonlinear Projections.
CoRR, 2023

On the effectiveness of Randomized Signatures as Reservoir for Learning Rough Dynamics.
Proceedings of the International Joint Conference on Neural Networks, 2023

Resurrecting Recurrent Neural Networks for Long Sequences.
Proceedings of the International Conference on Machine Learning, 2023

An SDE for Modeling SAM: Theory and Insights.
Proceedings of the International Conference on Machine Learning, 2023

Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Explicit Regularization in Overparametrized Models via Noise Injection.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022
Randomized Signature Layers for Signal Extraction in Time Series Data.
CoRR, 2022

Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On the Theoretical Properties of Noise Correlation in Stochastic Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Anticorrelated Noise Injection for Improved Generalization.
Proceedings of the International Conference on Machine Learning, 2022

Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Vanishing Curvature in Randomly Initialized Deep ReLU Networks.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks.
CoRR, 2021

Rethinking the Variational Interpretation of Accelerated Optimization Methods.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Second-order Convergence Properties of Random Search Methods.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning explanations that are hard to vary.
Proceedings of the 9th International Conference on Learning Representations, 2021

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Momentum Improves Optimization on Riemannian Manifolds.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
Two-Level K-FAC Preconditioning for Deep Learning.
CoRR, 2020

An Accelerated DFO Algorithm for Finite-sum Convex Functions.
Proceedings of the 37th International Conference on Machine Learning, 2020

A Continuous-time Perspective for Modeling Acceleration in Riemannian Optimization.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
The Role of Memory in Stochastic Optimization.
Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

Shadowing Properties of Optimization Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Continuous-time Models for Stochastic Optimization Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019


  Loading...