Sören Mindermann

Orcid: 0000-0002-0315-9821

According to our database1, Sören Mindermann authored at least 14 papers between 2017 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

2023
Managing AI Risks in an Era of Rapid Progress.
CoRR, 2023

Specific versus General Principles for Constitutional AI.
CoRR, 2023

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions.
CoRR, 2023

2022
Seasonal variation in SARS-CoV-2 transmission in temperate climates: A Bayesian modelling study in 143 European regions.
PLoS Comput. Biol., 2022

Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt.
Proceedings of the International Conference on Machine Learning, 2022

2021
Prioritized training on points that are learnable, worth learning, and not yet learned.
CoRR, 2021

Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
On the robustness of effectiveness estimation of nonpharmaceutical interventions against COVID-19 transmission.
CoRR, 2020

How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2018
Active Inverse Reward Design.
CoRR, 2018

Occam's razor is insufficient to infer the preferences of irrational agents.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017
Impossibility of deducing preferences and rationality from human policy.
CoRR, 2017


  Loading...