Fabien Roger

According to our database1, Fabien Roger authored at least 13 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.
CoRR, July, 2025

Why Do Some Language Models Fake Alignment While Others Don't?
CoRR, June, 2025

Reasoning Models Don't Always Say What They Think.
CoRR, May, 2025

Auditing language models for hidden objectives.
CoRR, March, 2025

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management.
CoRR, February, 2025

2024
Language Models Are Better Than Humans at Next-token Prediction.
Trans. Mach. Learn. Res., 2024

Alignment faking in large language models.
CoRR, 2024

Do Unlearning Methods Remove Information from Language Model Weights?
CoRR, 2024

Stress-Testing Capability Elicitation With Password-Locked Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

AI Control: Improving Safety Despite Intentional Subversion.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
Preventing Language Models From Hiding Their Reasoning.
CoRR, 2023

Measurement Tampering Detection Benchmark.
CoRR, 2023

Large Language Models Sometimes Generate Purely Negatively-Reinforced Text.
CoRR, 2023


  Loading...