Tom Lieberum

According to our database¹, Tom Lieberum authored at least 10 papers between 2021 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2024

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2.

[BibT_eX]

[DOI]

Tom Lieberum

Senthooran Rajamanoharan

CoRR, 2024

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders.

[BibT_eX]

[DOI]

Senthooran Rajamanoharan

CoRR, 2024

Improving Dictionary Learning with Gated Sparse Autoencoders.

[BibT_eX]

[DOI]

Senthooran Rajamanoharan

CoRR, 2024

Evaluating Frontier Models for Dangerous Capabilities.

[BibT_eX]

[DOI]

CoRR, 2024

AtP*: An efficient and scalable method for localizing LLM behaviour to components.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders.

[BibT_eX]

[DOI]

Senthooran Rajamanoharan

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla.

[BibT_eX]

[DOI]

CoRR, 2023

Progress measures for grokking via mechanistic interpretability.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

Retrospective on the 2021 BASALT Competition on Learning from Human Feedback.

[BibT_eX]

[DOI]

Nicholas R. Waytowich

CoRR, 2022

2021

Retrospective on the 2021 MineRL BASALT Competition on Learning from Human Feedback.

[BibT_eX]

[DOI]

Nicholas R. Waytowich

Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

Tom Lieberum

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...