Can Rager
According to our database1,
Can Rager
authored at least 14 papers
between 2023 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability.
CoRR, March, 2025
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
2024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability.
CoRR, 2024
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
2023
CoRR, 2023
Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, 2023