Thomas McGrath

Orcid: 0000-0003-2349-0439

Affiliations:
  • Goodfire Inc.


According to our database1, Thomas McGrath authored at least 20 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space.
CoRR, May, 2026

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior.
CoRR, May, 2026

Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts.
CoRR, May, 2026

Shifting the Gradient: Understanding How Defensive Training Methods Protect Language Model Integrity.
CoRR, April, 2026

Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability.
CoRR, February, 2026

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors.
CoRR, February, 2026

2025
Understanding sparse autoencoder scaling in the presence of feature manifolds.
CoRR, September, 2025

Competitive secretary problem.
Int. J. Game Theory, June, 2025

Open Problems in Mechanistic Interpretability.
Trans. Mach. Learn. Res., 2025

2023
Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero.
CoRR, 2023

Copy Suppression: Comprehensively Understanding an Attention Head.
CoRR, 2023

The Hydra Effect: Emergent Self-repair in Language Model Computations.
CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.
CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2021
Resonant-Tunnelling Diodes as PUF Building Blocks.
IEEE Trans. Emerg. Top. Comput., 2021

Acquisition of Chess Knowledge in AlphaZero.
CoRR, 2021

Causal Analysis of Agent Behavior for AI Safety.
CoRR, 2021

2020
Algorithms for Causal Reasoning in Probability Trees.
CoRR, 2020

Meta-trained agents implement Bayes-optimal agents.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019
Meta-learning of Sequential Strategies.
CoRR, 2019


  Loading...