We stand with Ukraine

We stand with Ukraine

Andrew M. Saxe

Orcid: 0000-0002-9831-8812

Affiliations:

University College London, UK

According to our database¹, Andrew M. Saxe authored at least 68 papers between 2006 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org
on scholar.google.com

On csauthors.net:

Bibliography

2026

Optimal Representation Size: High-Dimensional Analysis of Pretraining and Linear Probing.

[DOI]

Valentina Njaradi

,

Clémentine C. J. Dominé

,

,

,

CoRR, May, 2026

A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning.

[DOI]

Nicolas Anguita

,

Francesco Locatello

,

,

,

,

,

Clémentine C. J. Dominé

CoRR, February, 2026

Optimal Learning Rate Schedule for Balancing Effort and Performance.

[DOI]

Valentina Njaradi

,

Rodrigo Carrasco-Davis

,

Peter E. Latham

,

CoRR, January, 2026

2025

Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures.

[DOI]

,

,

Peter E. Latham

CoRR, December, 2025

Softmax ≥ Linear: Transformers may learn to classify in-context by kernel gradient descent.

[DOI]

Sara Dragutinovic

,

,

Aaditya K. Singh

CoRR, October, 2025

Distinct Computations Emerge From Compositional Curricula in In-Context Learning.

[DOI]

,

Andrew K. Lampinen

,

Aaditya K. Singh

,

CoRR, June, 2025

Revisiting the Role of Relearning in Semantic Dementia.

[DOI]

,

,

,

Benjamin Rosman

,

CoRR, March, 2025

When Are Bias-Free ReLU Networks Effectively Linear Networks?

[DOI]

,

,

Peter E. Latham

Trans. Mach. Learn. Res., 2025

Memory by accident: a theory of learning as a byproduct of network stabilization.

[DOI]

Basile Confavreux

,

William Dorrell

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Training Dynamics of In-Context Learning in Linear Attention.

[DOI]

,

Aaditya K. Singh

,

Peter E. Latham

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Strategy Coopetition Explains the Emergence and Transience of In-Context Learning.

[DOI]

Aaditya K. Singh

,

,

Sara Dragutinovic

,

,

Stephanie C. Y. Chan

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Algorithm Development in Neural Networks: Insights from the Streaming Parity Task.

[DOI]

Loek van Rossem

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Not all solutions are created equal: An analytical dissociation of functional and representational similarity in deep linear neural networks.

[DOI]

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

A Theory of Initialisation's Impact on Specialisation.

[DOI]

,

,

Clémentine Carla Juliette Dominé

,

,

Stefano Sarao Mannelli

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks.

[DOI]

,

,

Benjamin Rosman

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks.

[DOI]

Clémentine Carla Juliette Dominé

,

Nicolas Anguita

,

Alexandra Maria Proca

,

,

,

Pedro A. M. Mediano

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Abrupt and spontaneous strategy switches emerge in simple regularised neural networks.

[DOI]

,

,

Paul S. Muhle-Karbe

,

,

Christopher Summerfield

,

Nicolas W. Schuck

PLoS Comput. Biol., 2024

Early learning of the optimal constant solution in neural networks and humans.

[DOI]

,

,

,

Christopher Summerfield

CoRR, 2024

When Are Bias-Free ReLU Networks Like Linear Networks?

[DOI]

,

,

Peter E. Latham

CoRR, 2024

Flexible task abstractions emerge in linear networks with fast and bounded units.

[DOI]

,

,

Alexandra Maria Proca

,

,

Christopher Summerfield

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Nonlinear dynamics of localization in neural receptive fields.

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning.

[DOI]

,

Allan Raventós

,

Clémentine C. J. Dominé

,

,

David A. Klindt

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Understanding Unimodal Bias in Multimodal Deep Linear Networks.

[DOI]

,

Peter E. Latham

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation.

[DOI]

Aaditya K. Singh

,

,

,

Stephanie C. Y. Chan

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

When Representations Align: Universality in Representation Learning Dynamics.

[DOI]

Loek van Rossem

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks.

[DOI]

Stefano Sarao Mannelli

,

Yaraslau Ivashinka

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Why Do Animals Need Shaping? A Theory of Task Composition and Curriculum Learning.

[DOI]

,

Stefano Sarao Mannelli

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Modelling continual learning in humans with Hebbian context gating and exponentially decaying task signals.

[DOI]

,

,

,

Christopher Summerfield

PLoS Comput. Biol., January, 2023

A Theory of Unimodal Bias in Multimodal Learning.

[DOI]

,

Peter E. Latham

,

CoRR, 2023

Meta-Learning Strategies through Value Maximization in Neural Networks.

[DOI]

Rodrigo Carrasco-Davis

,

Javier Alejandro Masís

,

CoRR, 2023

The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions.

[DOI]

,

,

Stefano Sarao Mannelli

,

Sebastian Goldt

,

CoRR, 2023

Regularised neural networks mimic human insight.

[DOI]

,

,

Paul S. Muhle-Karbe

,

,

Christopher Summerfield

,

Nicolas W. Schuck

CoRR, 2023

The Transient Nature of Emergent In-Context Learning in Transformers.

[DOI]

Aaditya K. Singh

,

Stephanie C. Y. Chan

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On The Specialization of Neural Modules.

[DOI]

,

,

Benjamin Rosman

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Know your audience: specializing grounded language models with listener subtraction.

[DOI]

Aaditya K. Singh

,

,

,

,

Andrew K. Lampinen

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

2022

Probing transfer learning with a model of synthetic correlated datasets.

[DOI]

Federica Gerace

,

,

Stefano Sarao Mannelli

,

,

Lenka Zdeborová

Mach. Learn. Sci. Technol., 2022

Continual task learning in natural and artificial agents.

[DOI]

,

,

Christopher Summerfield

CoRR, 2022

Know your audience: specializing grounded language models with the game of Dixit.

[DOI]

Aaditya K. Singh

,

,

,

,

Andrew K. Lampinen

CoRR, 2022

An Analytical Theory of Curriculum Learning in Teacher-Student Networks.

[DOI]

,

Stefano Sarao Mannelli

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Exact learning dynamics of deep linear networks with prior knowledge.

[DOI]

,

Clémentine C. J. Dominé

,

James Fitzgerald

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Neural Race Reduction: Dynamics of Abstraction in Gated Networks.

[DOI]

,

,

Sam Jay Lewallen

Proceedings of the International Conference on Machine Learning, 2022

Maslow's Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation.

[DOI]

,

Stefano Sarao Mannelli

,

Claudia Clopath

,

Sebastian Goldt

,

Proceedings of the International Conference on Machine Learning, 2022

2021

Continual Learning in the Teacher-Student Setup: Impact of Task Similarity.

[DOI]

,

Sebastian Goldt

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Inferring Actions, Intentions, and Causal Relations in a Deep Neural Network.

[DOI]

,

Proceedings of the 43rd Annual Meeting of the Cognitive Science Society, 2021

2020

High-dimensional dynamics of generalization error in neural networks.

[DOI]

Madhu S. Advani

,

,

Haim Sompolinsky

Neural Networks, 2020

Characterizing emergent representations in a space of candidate learning rules for deep networks.

[DOI]

,

Christopher Summerfield

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

Generalisation dynamics of online learning in over-parameterised neural networks.

[DOI]

Sebastian Goldt

,

Madhu S. Advani

,

,

Florent Krzakala

,

Lenka Zdeborová

CoRR, 2019

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup.

[DOI]

Sebastian Goldt

,

,

,

Florent Krzakala

,

Lenka Zdeborová

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018

A mathematical theory of semantic development in deep neural networks.

[DOI]

,

James L. McClelland

,

CoRR, 2018

Minnorm training: an algorithm for training over-parameterized deep neural networks.

[DOI]

,

,

,

CoRR, 2018

Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning.

[DOI]

,

,

Madhu S. Advani

,

CoRR, 2018

On the Information Bottleneck Theory of Deep Learning.

[DOI]

,

,

,

,

Artemy Kolchinsky

,

Brendan D. Tracey

,

Proceedings of the 6th International Conference on Learning Representations, 2018

Hierarchical Subtask Discovery with Non-Negative Matrix Factorization.

[DOI]

Adam Christopher Earle

,

,

Benjamin Rosman

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

High-dimensional dynamics of generalization error in neural networks.

[DOI]

Madhu S. Advani

,

CoRR, 2017

Hierarchy Through Composition with Multitask LMDPs.

[DOI]

,

Adam Christopher Earle

,

Benjamin Rosman

Proceedings of the 34th International Conference on Machine Learning, 2017

2016

Hierarchy through Composition with Linearly Solvable Markov Decision Processes.

[DOI]

,

Adam Christopher Earle

,

Benjamin Rosman

CoRR, 2016

Active Long Term Memory Networks.

[DOI]

Tommaso Furlanello

,

,

,

,

CoRR, 2016

Tensor Switching Networks.

[DOI]

Chuan-Yung Tsai

,

,

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Tutorial Workshop on Contemporary Deep Neural Network Models.

[DOI]

James L. McClelland

,

Steven Stenberg Hansen

,

Proceedings of the 38th Annual Meeting of the Cognitive Science Society, 2016

2014

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks.

[DOI]

,

James L. McClelland

,

Proceedings of the 2nd International Conference on Learning Representations, 2014

Multitask model-free reinforcement learning.

[DOI]

Proceedings of the 36th Annual Meeting of the Cognitive Science Society, 2014

Deep Learning and the Brain.

[DOI]

Proceedings of the 36th Annual Meeting of the Cognitive Science Society, 2014

Modeling Perceptual Learning with Deep Networks.

[DOI]

,

Proceedings of the 36th Annual Meeting of the Cognitive Science Society, 2014

2013

Learning hierarchical categories in deep neural networks.

[DOI]

,

James L. McClelland

,

Proceedings of the 35th Annual Meeting of the Cognitive Science Society, 2013

2011

Unsupervised learning models of primary cortical receptive fields and receptive field plasticity.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

On Random Weights and Unsupervised Feature Learning.

[DOI]

,

,

,

,

,

Proceedings of the 28th International Conference on Machine Learning, 2011

2009

Measuring Invariances in Deep Networks.

[DOI]

Ian J. Goodfellow

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

2006

Prospect Eleven: Princeton University's entry in the 2005 DARPA Grand Challenge.

[DOI]

Anand R. Atreya

,

Bryan C. Cattle

,

Brendan M. Collins

,

Benjamin Essenburg

,

Gordon H. Franken

,

,

Scott N. Schiffres

,

Alain L. Kornhauser

J. Field Robotics, 2006

Loading...