Adam Karvonen

According to our database1, Adam Karvonen authored at least 8 papers between 2024 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning.
CoRR, July, 2025

Robustly Improving LLM Fairness in Realistic Settings via Interpretability.
CoRR, June, 2025

Learning Multi-Level Features with Matryoshka Sparse Autoencoders.
CoRR, March, 2025

Revisiting End-To-End Sparse Autoencoder Training: A Short Finetune Is All You Need.
CoRR, March, 2025

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability.
CoRR, March, 2025

2024
Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks.
CoRR, 2024

Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models.
CoRR, 2024

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024


  Loading...