Bilal Chughtai

According to our database¹, Bilal Chughtai authored at least 15 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Training on Documents About Monitoring Leads to CoT Obfuscation.

[BibT_eX]

[DOI]

Reilly Haskins

Bilal Chughtai

Joshua Engels

CoRR, May, 2026

Building Production-Ready Probes For Gemini.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

Difficulties with Evaluating a Deception Detector for AIs.

[BibT_eX]

[DOI]

Lewis Smith

Bilal Chughtai

Neel Nanda

CoRR, November, 2025

Detecting Strategic Deception Using Linear Probes.

[BibT_eX]

[DOI]

Nicholas Goldowsky-Dill

Bilal Chughtai

Stefan Heimersheim

Marius Hobbhahn

CoRR, February, 2025

Open Problems in Mechanistic Interpretability.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities.

[BibT_eX]

[DOI]

Dylan Hadfield-Menell

Trans. Mach. Learn. Res., 2025

Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Detecting Strategic Deception with Linear Probes.

[BibT_eX]

[DOI]

Nicholas Goldowsky-Dill

Bilal Chughtai

Stefan Heimersheim

Marius Hobbhahn

Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024

Towards evaluations-based safety cases for AI scheming.

[BibT_eX]

[DOI]

Nicholas Goldowsky-Dill

CoRR, 2024

Transformer Circuit Faithfulness Metrics are not Robust.

[BibT_eX]

[DOI]

Joseph Miller

Bilal Chughtai

William Saunders

CoRR, 2024

Can Language Models Explain Their Own Classification Behavior?

[BibT_eX]

[DOI]

Dane Sherburn

Bilal Chughtai

Owain Evans

CoRR, 2024

Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs.

[BibT_eX]

[DOI]

Bilal Chughtai

Alan Cooney

Neel Nanda

CoRR, 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

A Toy Model of Universality: Reverse Engineering how Networks Learn Group Operations.

[BibT_eX]

[DOI]

Bilal Chughtai

Lawrence Chan

Neel Nanda

Proceedings of the International Conference on Machine Learning, 2023

2018

Variable Selection for Chronic Disease Outcome Prediction Using a Causal Inference Technique: A Preliminary Study.

[BibT_eX]

[DOI]

John Richard Lee

Bilal Chughtai

Rema Padman

Proceedings of the IEEE International Conference on Healthcare Informatics, 2018

Bilal Chughtai

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...