Thomas McGrath

Orcid: 0000-0003-2349-0439

Affiliations:

Goodfire Inc.

According to our database¹, Thomas McGrath authored at least 20 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space.

[BibT_eX]

[DOI]

CoRR, May, 2026

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior.

[BibT_eX]

[DOI]

CoRR, May, 2026

Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts.

[BibT_eX]

[DOI]

CoRR, May, 2026

Shifting the Gradient: Understanding How Defensive Training Methods Protect Language Model Integrity.

[BibT_eX]

[DOI]

CoRR, April, 2026

Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability.

[BibT_eX]

[DOI]

Aaditya Vikram Prasad

CoRR, February, 2026

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

Understanding sparse autoencoder scaling in the presence of feature manifolds.

[BibT_eX]

[DOI]

Eric J. Michaud

Liv Gorton

Tom McGrath

CoRR, September, 2025

Competitive secretary problem.

[BibT_eX]

[DOI]

Tom McGrath

Marc Schröder

Int. J. Game Theory, June, 2025

Open Problems in Mechanistic Interpretability.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

2023

Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero.

[BibT_eX]

[DOI]

CoRR, 2023

Copy Suppression: Comprehensively Understanding an Attention Head.

[BibT_eX]

[DOI]

CoRR, 2023

The Hydra Effect: Emergent Self-repair in Language Model Computations.

[BibT_eX]

[DOI]

CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.

[BibT_eX]

[DOI]

CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2021

Resonant-Tunnelling Diodes as PUF Building Blocks.

[BibT_eX]

[DOI]

RamÓn Bernardo Gavito

Robert J. Young

Utz Roedig

IEEE Trans. Emerg. Top. Comput., 2021

Acquisition of Chess Knowledge in AlphaZero.

[BibT_eX]

[DOI]

CoRR, 2021

Causal Analysis of Agent Behavior for AI Safety.

[BibT_eX]

[DOI]

CoRR, 2021

2020

Algorithms for Causal Reasoning in Probability Trees.

[BibT_eX]

[DOI]

CoRR, 2020

Meta-trained agents implement Bayes-optimal agents.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

Meta-learning of Sequential Strategies.

[BibT_eX]

[DOI]

CoRR, 2019

Thomas McGrath

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...