Lee Sharkey

According to our database1, Lee Sharkey authored at least 7 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Black-Box Access is Insufficient for Rigorous AI Audits.
CoRR, 2024

2023
Sparse Autoencoders Find Highly Interpretable Features in Language Models.
CoRR, 2023

A technical note on bilinear layers for interpretability.
CoRR, 2023

2022
Circumventing interpretability: How to defeat mind-readers.
CoRR, 2022

Interpreting Neural Networks through the Polytope Lens.
CoRR, 2022

Goal Misgeneralization in Deep Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2022

2021
Objective Robustness in Deep Reinforcement Learning.
CoRR, 2021


  Loading...