Thilo Hagendorff

Evaluation Awareness in Language Models Has Limited Effect on Behaviour.

[BibT_eX]

[DOI]

Amelie Knecht

,

Lucas Florin

,

Thilo Hagendorff

CoRR, May, 2026

"Dark Triad" Model Organisms of Misalignment: Narrow Fine-Tuning Mirrors Human Antisocial Behavior.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, March, 2026

Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment.

[BibT_eX]

[DOI]

Laurène Vaugrante

,

Anietta Weckauff

,

Thilo Hagendorff

CoRR, February, 2026

Compromising Honesty and Harmlessness in Language Models via Covert Deception Attacks.

[BibT_eX]

[DOI]

,

,

,

Trans. Mach. Learn. Res., 2026

Speciesism in AI: Evaluating Discrimination Against Animals in Large Language Models.

[BibT_eX]

[DOI]

,

,

,

CoRR, August, 2025

Large Reasoning Models Are Autonomous Jailbreak Agents.

[BibT_eX]

[DOI]

Thilo Hagendorff

,

Erik Derner

,

Nuria Oliver

CoRR, August, 2025

On the Inevitability of Left-Leaning Political Bias in Aligned Language Models.

[BibT_eX]

[DOI]

Thilo Hagendorff

CoRR, July, 2025

PRIDE - Parameter-Efficient Reduction of Identity Discrimination for Equality in LLMs.

[BibT_eX]

[DOI]

Maluna Menke

,

Thilo Hagendorff

CoRR, July, 2025

Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models.

[BibT_eX]

[DOI]

Thilo Hagendorff

,

Sarah Fabi

CoRR, April, 2025

Compromising Honesty and Harmlessness in Language Models via Deception Attacks.

[BibT_eX]

[DOI]

,

,

,

CoRR, February, 2025

Prompt Engineering Techniques for Language Model Reasoning Lack Replicability.

[BibT_eX]

[DOI]

Laurène Vaugrante

,

Mathias Niepert

,

Thilo Hagendorff

Trans. Mach. Learn. Res., 2025

Mapping the Ethics of Generative AI: A Comprehensive Scoping Review.

[BibT_eX]

[DOI]

Thilo Hagendorff

Minds Mach., December, 2024

Why we need biased AI: How including cognitive biases can enhance AI systems.

[BibT_eX]

[DOI]

Thilo Hagendorff

,

Sarah Fabi

J. Exp. Theor. Artif. Intell., November, 2024

When Image Generation Goes Wrong: A Safety Analysis of Stable Diffusion Models.

[BibT_eX]

[DOI]

Matthias Schneider

,

Thilo Hagendorff

CoRR, 2024

A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions.

[BibT_eX]

[DOI]

Laurène Vaugrante

,

Mathias Niepert

,

Thilo Hagendorff

CoRR, 2024

Lessons Learned from Assessing Trustworthy AI in Practice.

[BibT_eX]

[DOI]

Digit. Soc., December, 2023

Speciesist bias in AI: a reply to Arandjelović.

[BibT_eX]

[DOI]

,

,

,

AI Ethics, November, 2023

Ethical and methodological challenges in building morally informed AI systems.

[BibT_eX]

[DOI]

Thilo Hagendorff

,

David Danks

AI Ethics, May, 2023

Ethical considerations and statistical analysis of industry involvement in machine learning research.

[BibT_eX]

[DOI]

Thilo Hagendorff

,

Kristof Meding

AI Soc., February, 2023

AI ethics and its pitfalls: not living up to its own standards?

[BibT_eX]

[DOI]

Thilo Hagendorff

AI Ethics, February, 2023

Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT.

[BibT_eX]

[DOI]

Thilo Hagendorff

,

Sarah Fabi

,

Michal Kosinski

Nat. Comput. Sci., 2023

Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms.

[BibT_eX]

[DOI]

Kristof Meding

,

Thilo Hagendorff

CoRR, 2023

Deception Abilities Emerged in Large Language Models.

[BibT_eX]

[DOI]

Thilo Hagendorff

CoRR, 2023

Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods.

[BibT_eX]

[DOI]

Thilo Hagendorff

CoRR, 2023

Methodological reflections for AI alignment research using human feedback.

[BibT_eX]

[DOI]

Thilo Hagendorff

,

Sarah Fabi

CoRR, 2023

Speciesist bias in AI: how AI applications perpetuate discrimination and unfair outcomes against animals.

[BibT_eX]

[DOI]

,

,

,

AI Ethics, 2023

Machine intuition: Uncovering human-like intuitive decision-making in GPT-3.5.

[BibT_eX]

[DOI]

Thilo Hagendorff

,

Sarah Fabi

,

Michal Kosinski

CoRR, 2022

How to Assess Trustworthy AI in Practice.

[BibT_eX]

[DOI]

CoRR, 2022

Why we need biased AI - How including cognitive and ethical machine biases can enhance AI systems.

[BibT_eX]

[DOI]

Sarah Fabi

,

Thilo Hagendorff

CoRR, 2022

Blind spots in AI ethics.

[BibT_eX]

[DOI]

Thilo Hagendorff

AI Ethics, 2022

Linking Human And Machine Behavior: A New Approach to Evaluate Training Data Quality for Beneficial Machine Learning.

[BibT_eX]

[DOI]

Thilo Hagendorff

Minds Mach., 2021

Forbidden knowledge in machine learning reflections on the limits of research and publication.

[BibT_eX]

[DOI]

Thilo Hagendorff

AI Soc., 2021

Publisher Correction to: The Ethics of AI Ethics: An Evaluation of Guidelines.

[BibT_eX]

[DOI]

Thilo Hagendorff

Minds Mach., 2020

The Ethics of AI Ethics: An Evaluation of Guidelines.

[BibT_eX]

[DOI]

Thilo Hagendorff

Minds Mach., 2020

AI virtues - The missing link in putting AI ethics into practice.

[BibT_eX]

[DOI]

Thilo Hagendorff

CoRR, 2020

Ethical behavior in humans and machines - Evaluating training data quality for beneficial machine learning.

[BibT_eX]

[DOI]

Thilo Hagendorff

CoRR, 2020

The Big Picture: Ethical Considerations and Statistical Analysis of Industry Involvement in Machine Learning Research.

[BibT_eX]

[DOI]

Thilo Hagendorff

,

Kristof Meding

CoRR, 2020

15 challenges for AI: or what AI (currently) can't do.

[BibT_eX]

[DOI]

Thilo Hagendorff

,

Katharina Wezel

AI Soc., 2020

From privacy to anti-discrimination in times of machine learning.

[BibT_eX]

[DOI]

Thilo Hagendorff

Ethics Inf. Technol., 2019

Artificial Intelligence Governance and Ethics: Global Perspectives.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2019

Animal Rights and Robot Ethics.

[BibT_eX]

[DOI]

Thilo Hagendorff

Int. J. Technoethics, 2017

Thilo Hagendorff

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...