Bilal Chughtai

According to our database1, Bilal Chughtai authored at least 15 papers between 2018 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Training on Documents About Monitoring Leads to CoT Obfuscation.
CoRR, May, 2026

Building Production-Ready Probes For Gemini.
CoRR, January, 2026

2025
Difficulties with Evaluating a Deception Detector for AIs.
CoRR, November, 2025

Detecting Strategic Deception Using Linear Probes.
CoRR, February, 2025

Open Problems in Mechanistic Interpretability.
Trans. Mach. Learn. Res., 2025

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities.
Trans. Mach. Learn. Res., 2025

Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Detecting Strategic Deception with Linear Probes.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024
Towards evaluations-based safety cases for AI scheming.
CoRR, 2024

Transformer Circuit Faithfulness Metrics are not Robust.
CoRR, 2024

Can Language Models Explain Their Own Classification Behavior?
CoRR, 2024

Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs.
CoRR, 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
A Toy Model of Universality: Reverse Engineering how Networks Learn Group Operations.
Proceedings of the International Conference on Machine Learning, 2023

2018
Variable Selection for Chronic Disease Outcome Prediction Using a Causal Inference Technique: A Preliminary Study.
Proceedings of the IEEE International Conference on Healthcare Informatics, 2018


  Loading...