Nicholas Goldowsky-Dill

According to our database¹, Nicholas Goldowsky-Dill authored at least 9 papers between 2023 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Stress Testing Deliberative Alignment for Anti-Scheming Training.

[BibT_eX]

[DOI]

Bronson Schoen

Evgenia Nitishinskaya

Nicholas Goldowsky-Dill

CoRR, September, 2025

Detecting Strategic Deception Using Linear Probes.

[BibT_eX]

[DOI]

Nicholas Goldowsky-Dill

Bilal Chughtai

Stefan Heimersheim

Marius Hobbhahn

CoRR, February, 2025

Open Problems in Mechanistic Interpretability.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Detecting Strategic Deception with Linear Probes.

[BibT_eX]

[DOI]

Nicholas Goldowsky-Dill

Bilal Chughtai

Stefan Heimersheim

Marius Hobbhahn

Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024

Towards evaluations-based safety cases for AI scheming.

[BibT_eX]

[DOI]

Nicholas Goldowsky-Dill

CoRR, 2024

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks.

[BibT_eX]

[DOI]

Lucius Bushnaq

Stefan Heimersheim

Nicholas Goldowsky-Dill

CoRR, 2024

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability.

[BibT_eX]

[DOI]

Nicholas Goldowsky-Dill

Kaarel Hänni

Cindy Wu

Marius Hobbhahn

CoRR, 2024

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning.

[BibT_eX]

[DOI]

Dan Braun

Jordan Taylor

Nicholas Goldowsky-Dill

Lee Sharkey

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

Localizing Model Behavior with Path Patching.

[BibT_eX]

[DOI]

Nicholas Goldowsky-Dill

Chris MacLeod

Lucas Sato

Aryaman Arora

CoRR, 2023

Nicholas Goldowsky-Dill

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...