Samy Jelassi

CoRR, March, 2026

2025

Let's (not) just put things in Context: Test-Time Training for Long-Context LLMs.

[BibT_eX]

[DOI]

CoRR, December, 2025

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining.

[BibT_eX]

[DOI]

CoRR, April, 2025

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning.

[BibT_eX]

[DOI]

CoRR, April, 2025

The Role of Sparsity for Length Generalization in Transformers.

[BibT_eX]

[DOI]

CoRR, February, 2025

Let Me Think! A Long Chain of Thought Can Be Worth Exponentially Many Short Ones.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Universal Length Generalization with Turing Programs.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

The Role of Sparsity for Length Generalization in LLMs.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Mixture of Parrots: Experts improve memorization more than reasoning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

2024

Collective Model Intelligence Requires Compatible Specialization.

[BibT_eX]

[DOI]

Jyothish Pari

Pulkit Agrawal

CoRR, 2024

How Does Overparameterization Affect Features?

[BibT_eX]

[DOI]

Ahmet Cagri Duzgun

CoRR, 2024

Repeat After Me: Transformers are Better than State Space Models at Copying.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Algorithmic and architectural implicit biases in deep learning

[BibT_eX]

[DOI]

PhD thesis, 2023

Length Generalization in Arithmetic Transformers.

[BibT_eX]

[DOI]

Stéphane d'Ascoli

Yuhuai Wu

François Charton

CoRR, 2023

Depth Dependence of μP Learning Rates in ReLU MLPs.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Depth separation beyond radial functions.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2022

A Momentumized, Adaptive, Dual Averaged Gradient Method.

[BibT_eX]

[DOI]

Aaron Defazio

J. Mach. Learn. Res., 2022

Dissecting adaptive methods in GANs.

[BibT_eX]

[DOI]

CoRR, 2022

Vision Transformers provably learn spatial structure.

[BibT_eX]

[DOI]

Michael E. Sander

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards understanding how momentum improves generalization in deep learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

2021

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization.

[BibT_eX]

[DOI]

Aaron Defazio

CoRR, 2021

Auction Learning as a Two-Player Game.

[BibT_eX]

[DOI]

Jad Rahme

S. Matthew Weinberg

Proceedings of the 9th International Conference on Learning Representations, 2021

A Permutation-Equivariant Neural Network Architecture For Auction Design.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Dual Averaging is Surprisingly Effective for Deep Learning Optimization.

[BibT_eX]

[DOI]

Aaron Defazio

CoRR, 2020

A mean-field analysis of two-player zero-sum games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Extra-gradient with player sampling for faster convergence in n-player games.

[BibT_eX]

[DOI]

Damien Scieur

Arthur Mensch

Joan Bruna

Proceedings of the 37th International Conference on Machine Learning, 2020

2019

Extra-gradient with player sampling for provable fast convergence in n-player games.

[BibT_eX]

[DOI]

Damien Scieur

Arthur Mensch

Joan Bruna

CoRR, 2019

Global convergence of neuron birth-death dynamics.

[BibT_eX]

[DOI]

CoRR, 2019

Towards closing the gap between the theory and practice of SVRG.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Neuron birth-death dynamics accelerates gradient descent and converges asymptotically.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

2018

Smoothed analysis of the low-rank approach for smooth semidefinite programs.

[BibT_eX]

[DOI]

Thomas Pumir