We stand with Ukraine

We stand with Ukraine

David Lindner

Orcid: 0000-0001-7051-7433

According to our database¹, David Lindner authored at least 32 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability.

[BibT_eX]

[DOI]

Artur Zolkowski

,

,

,

Florian Tramèr

,

CoRR, October, 2025

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.

[BibT_eX]

[DOI]

CoRR, July, 2025

Early Signs of Steganographic Capabilities in Frontier LLMs.

[BibT_eX]

[DOI]

Artur Zolkowski

,

Kei Nishimura-Gasparian

,

Robert McCarthy

,

Roland S. Zimmermann

,

CoRR, July, 2025

Large language models can learn and generalize steganographic chain-of-thought under process supervision.

[BibT_eX]

[DOI]

,

Luis Ibañez-Lissen

,

Robert McCarthy

,

,

,

Hannes Whittingham

,

Lorena González-Manzano

,

,

,

Edward James Young

,

CoRR, June, 2025

Evaluating Frontier Models for Stealth and Situational Awareness.

[BibT_eX]

[DOI]

,

Roland S. Zimmermann

,

,

,

Victoria Krakovna

,

,

,

,

CoRR, May, 2025

An Approach to Technical AGI Safety and Security.

[BibT_eX]

[DOI]

CoRR, April, 2025

MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking.

[BibT_eX]

[DOI]

Sebastian Farquhar

,

,

,

,

,

,

CoRR, January, 2025

2024

MISR: Measuring Instrumental Self-Reasoning in Frontier Models.

[BibT_eX]

[DOI]

,

CoRR, 2024

ViSTa Dataset: Do vision-language models understand sequential tasks?

[BibT_eX]

[DOI]

,

Evan Ryan Gunter

,

Mikhail Seleznyov

,

CoRR, 2024

Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework.

[BibT_eX]

[DOI]

,

,

,

Mennatallah El-Assady

CoRR, 2024

Towards evaluations-based safety cases for AI scheming.

[BibT_eX]

[DOI]

,

Marius Hobbhahn

,

,

Alexander Meinke

,

,

,

,

Jérémy Scheurer

,

,

,

Nicholas Goldowsky-Dill

,

,

,

,

Daniel Kokotajlo

,

CoRR, 2024

Evaluating Frontier Models for Dangerous Capabilities.

[BibT_eX]

[DOI]

CoRR, 2024

On scalable oversight with weak LLMs judging strong LLMs.

[BibT_eX]

[DOI]

,

,

,

Jonah Brown-Cohen

,

,

,

Rishabh Agarwal

,

,

,

Noah D. Goodman

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning.

[BibT_eX]

[DOI]

,

Victoriano Montesinos

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Learning Safety Constraints from Demonstrations with Unknown Rewards.

[BibT_eX]

[DOI]

,

,

Sebastian Tschiatschek

,

,

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

GoSafeOpt: Scalable safe exploration for global optimization of dynamical systems.

[BibT_eX]

[DOI]

,

Matteo Turchetta

,

,

,

Sebastian Trimpe

,

Dominik Baumann

Artif. Intell., July, 2023

Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

PhD thesis, 2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback.

[BibT_eX]

[DOI]

,

,

,

,

Mennatallah El-Assady

CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.

[BibT_eX]

[DOI]

,

,

,

,

Vladimir Mikulik

CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.

[BibT_eX]

[DOI]

,

,

Sebastian Farquhar

,

,

,

Vladimir Mikulik

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

Red-Teaming the Stable Diffusion Safety Filter.

[BibT_eX]

[DOI]

,

,

,

,

Florian Tramèr

CoRR, 2022

Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning.

[BibT_eX]

[DOI]

,

Mennatallah El-Assady

CoRR, 2022

Scalable Safe Exploration for Global Optimization of Dynamical Systems.

[BibT_eX]

[DOI]

,

Matteo Turchetta

,

,

,

Sebastian Trimpe

,

Dominik Baumann

CoRR, 2022

Active Exploration for Inverse Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

Giorgia Ramponi

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Interactively Learning Preference Constraints in Linear Bandits.

[BibT_eX]

[DOI]

,

Sebastian Tschiatschek

,

,

Proceedings of the International Conference on Machine Learning, 2022

2021

Information Directed Reward Learning for Reinforcement Learning.

[BibT_eX]

[DOI]

,

Matteo Turchetta

,

Sebastian Tschiatschek

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Addressing the Long-term Impact of ML Decisions via Policy Regret.

[BibT_eX]

[DOI]

,

,

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Learning What To Do by Simulating the Past.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 9th International Conference on Learning Representations, 2021

Challenges for Using Impact Regularizers to Avoid Negative Side Effects.

[BibT_eX]

[DOI]

,

,

Alexander Meulemans

Proceedings of the Workshop on Artificial Intelligence Safety 2021 (SafeAI 2021) co-located with the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021), 2021

2019

Sensing Social Media Signals for Cryptocurrency News.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Nino Antulov-Fantulin

Proceedings of the Companion of The 2019 World Wide Web Conference, 2019

Detecting Spiky Corruption in Markov Decision Processes.

[BibT_eX]

[DOI]

,

Tomasz Kisielewski

,

,

Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, 2019

Loading...