Tom Lieberum

According to our database1, Tom Lieberum authored at least 6 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Evaluating Frontier Models for Dangerous Capabilities.
CoRR, 2024

AtP*: An efficient and scalable method for localizing LLM behaviour to components.
CoRR, 2024

2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla.
CoRR, 2023

Progress measures for grokking via mechanistic interpretability.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
Retrospective on the 2021 BASALT Competition on Learning from Human Feedback.
CoRR, 2022

2021
Retrospective on the 2021 MineRL BASALT Competition on Learning from Human Feedback.
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021


  Loading...