Adam Karvonen

According to our database¹, Adam Karvonen authored at least 14 papers between 2024 and 2026.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Negation Neglect: When models fail to learn negations in training.

[BibT_eX]

[DOI]

CoRR, May, 2026

2025

Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers.

[BibT_eX]

[DOI]

CoRR, December, 2025

DiFR: Inference Verification Despite Nondeterminism.

[BibT_eX]

[DOI]

CoRR, November, 2025

Verifying LLM Inference to Prevent Model Weight Exfiltration.

[BibT_eX]

[DOI]

CoRR, November, 2025

Automatically Finding Rule-Based Neurons in OthelloGPT.

[BibT_eX]

[DOI]

Aditya Singh

Zihang Wen

Srujananjali Medicherla

Adam Karvonen

Can Rager

CoRR, November, 2025

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning.

[BibT_eX]

[DOI]

Senthooran Rajamanoharan

Neel Nanda

CoRR, July, 2025

Robustly Improving LLM Fairness in Realistic Settings via Interpretability.

[BibT_eX]

[DOI]

Adam Karvonen

Samuel Marks

CoRR, June, 2025

Revisiting End-To-End Sparse Autoencoder Training: A Short Finetune Is All You Need.

[BibT_eX]

[DOI]

Adam Karvonen

CoRR, March, 2025

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability.

[BibT_eX]

[DOI]

CoRR, March, 2025

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Learning Multi-Level Features with Matryoshka Sparse Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024

Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models.

[BibT_eX]

[DOI]

Adam Karvonen

CoRR, 2024

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models.

[BibT_eX]

[DOI]

Claudio Mayrink Verdun

David Bau

Samuel Marks

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Adam Karvonen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...