Adam Karvonen

According to our database¹, Adam Karvonen authored at least 10 papers between 2024 and 2025.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Verifying LLM Inference to Prevent Model Weight Exfiltration.

[BibT_eX]

[DOI]

CoRR, November, 2025

Automatically Finding Rule-Based Neurons in OthelloGPT.

[BibT_eX]

[DOI]

Aditya Singh

Zihang Wen

Srujananjali Medicherla

Adam Karvonen

Can Rager

CoRR, November, 2025

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning.

[BibT_eX]

[DOI]

Senthooran Rajamanoharan

Neel Nanda

CoRR, July, 2025

Robustly Improving LLM Fairness in Realistic Settings via Interpretability.

[BibT_eX]

[DOI]

Adam Karvonen

Samuel Marks

CoRR, June, 2025

Learning Multi-Level Features with Matryoshka Sparse Autoencoders.

[BibT_eX]

[DOI]

CoRR, March, 2025

Revisiting End-To-End Sparse Autoencoder Training: A Short Finetune Is All You Need.

[BibT_eX]

[DOI]

Adam Karvonen

CoRR, March, 2025

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability.

[BibT_eX]

[DOI]

CoRR, March, 2025

2024

Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models.

[BibT_eX]

[DOI]

Adam Karvonen

CoRR, 2024

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models.

[BibT_eX]

[DOI]

Claudio Mayrink Verdun

David Bau

Samuel Marks

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Adam Karvonen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...