Stefan Heimersheim

According to our database1, Stefan Heimersheim authored at least 14 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Benchmarking Deception Probes via Black-to-White Performance Boosts.
CoRR, July, 2025

Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability.
CoRR, July, 2025

Detecting Strategic Deception Using Linear Probes.
CoRR, February, 2025

Open Problems in Mechanistic Interpretability.
CoRR, January, 2025

Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition.
CoRR, January, 2025

2024
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs.
CoRR, 2024

Evolution of SAE Features Across Layers in LLMs.
CoRR, 2024

Characterizing stable regions in the residual stream of LLMs.
CoRR, 2024

Evaluating Synthetic Activations composed of SAE Latents in GPT-2.
CoRR, 2024

You can remove GPT2's LayerNorm by fine-tuning.
CoRR, 2024

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks.
CoRR, 2024

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability.
CoRR, 2024

How to use and interpret activation patching.
CoRR, 2024

2023
Towards Automated Circuit Discovery for Mechanistic Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023


  Loading...