Sören Mindermann

Orcid: 0000-0002-0315-9821

According to our database¹, Sören Mindermann authored at least 30 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Introspection Adapters: Training LLMs to Report Their Learned Behaviors.

[BibT_eX]

[DOI]

CoRR, April, 2026

Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies.

[BibT_eX]

[DOI]

CoRR, January, 2026

Open Technical Problems in Open-Weight AI Model Risk Management.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

2025

International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management.

[BibT_eX]

[DOI]

Yoshua Bengio

Stephen Clare

Carina Prunkl

Maksym Andriushchenko

Fahad Albalawi Noora AlMalek

Oleksii Molchanovskyi

José Ramón López Portillo

CoRR, November, 2025

International AI Safety Report 2025: First Key Update: Capabilities and Risk Implications.

[BibT_eX]

[DOI]

Maksym Andriushchenko

Fahad Albalawi Noora AlMalek

Christian Busch

André C. P. L. F. de Carvalho

Oleksii Molchanovskyi

José Ramón López Portillo

CoRR, October, 2025

Agentic Misalignment: How LLMs Could Be Insider Threats.

[BibT_eX]

[DOI]

CoRR, October, 2025

The Singapore Consensus on Global AI Safety Research Priorities.

[BibT_eX]

[DOI]

Vidhisha Balachandran

Bryan Low Kian Hsiang

CoRR, June, 2025

Bare Minimum Mitigations for Autonomous AI Development.

[BibT_eX]

[DOI]

CoRR, April, 2025

In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?

[BibT_eX]

[DOI]

CoRR, April, 2025

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

[BibT_eX]

[DOI]

Pierre-Luc St-Charles

David Williams-King

CoRR, February, 2025

International AI Safety Report.

[BibT_eX]

[DOI]

Inioluwa Deborah Raji

Pierre-Olivier Gourinchas

André Carlos Ponce de Leon Ferreira de Carvalho

Dominic Vincent Ligot

Oleksii Molchanovskyi

José Ramón López Portillo

CoRR, January, 2025

Open Problems in Machine Unlearning for AI Safety.

[BibT_eX]

[DOI]

José Hernández-Orallo

Mor Geva

Yarin Gal

CoRR, January, 2025

In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?

[BibT_eX]

[DOI]

Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 2025

2024

Alignment faking in large language models.

[BibT_eX]

[DOI]

CoRR, 2024

International Scientific Report on the Safety of Advanced AI (Interim Report).

[BibT_eX]

[DOI]

CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.

[BibT_eX]

[DOI]

CoRR, 2024

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The Alignment Problem from a Deep Learning Perspective.

[BibT_eX]

[DOI]

Richard Ngo

Lawrence Chan

Sören Mindermann

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Managing AI Risks in an Era of Rapid Progress.

[BibT_eX]

[DOI]

CoRR, 2023

Specific versus General Principles for Constitutional AI.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Seasonal variation in SARS-CoV-2 transmission in temperate climates: A Bayesian modelling study in 143 European regions.

[BibT_eX]

[DOI]

Tomas Gavenciak

Joshua Teperowski Monrad

PLoS Comput. Biol., 2022

Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

2021

Prioritized training on points that are learnable, worth learning, and not yet learned.

[BibT_eX]

[DOI]

CoRR, 2021

Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

On the robustness of effectiveness estimation of nonpharmaceutical interventions against COVID-19 transmission.

[BibT_eX]

[DOI]

CoRR, 2020

How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2018

Active Inverse Reward Design.

[BibT_eX]

[DOI]

Sören Mindermann

Rohin Shah

Adam Gleave

Dylan Hadfield-Menell

CoRR, 2018

Occam's razor is insufficient to infer the preferences of irrational agents.

[BibT_eX]

[DOI]

Stuart Armstrong

Sören Mindermann

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

Impossibility of deducing preferences and rationality from human policy.

[BibT_eX]

[DOI]

Stuart Armstrong

Sören Mindermann

CoRR, 2017

Sören Mindermann

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...