Sören Mindermann

Orcid: 0000-0002-0315-9821

According to our database1, Sören Mindermann authored at least 24 papers between 2017 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
The Singapore Consensus on Global AI Safety Research Priorities.
CoRR, June, 2025

Bare Minimum Mitigations for Autonomous AI Development.
CoRR, April, 2025

In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?
CoRR, April, 2025

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
CoRR, February, 2025

International AI Safety Report.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, January, 2025

Open Problems in Machine Unlearning for AI Safety.
CoRR, January, 2025

In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?
Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 2025

2024
Alignment faking in large language models.
CoRR, 2024

International Scientific Report on the Safety of Advanced AI (Interim Report).
CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

The Alignment Problem from a Deep Learning Perspective.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Managing AI Risks in an Era of Rapid Progress.
CoRR, 2023

Specific versus General Principles for Constitutional AI.
CoRR, 2023

2022
Seasonal variation in SARS-CoV-2 transmission in temperate climates: A Bayesian modelling study in 143 European regions.
PLoS Comput. Biol., 2022

Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt.
Proceedings of the International Conference on Machine Learning, 2022

2021
Prioritized training on points that are learnable, worth learning, and not yet learned.
CoRR, 2021

Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
On the robustness of effectiveness estimation of nonpharmaceutical interventions against COVID-19 transmission.
CoRR, 2020

How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2018
Active Inverse Reward Design.
CoRR, 2018

Occam's razor is insufficient to infer the preferences of irrational agents.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017
Impossibility of deducing preferences and rationality from human policy.
CoRR, 2017


  Loading...