Jeffrey Ladish

According to our database1, Jeffrey Ladish authored at least 13 papers between 2016 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Language Models Can Autonomously Hack and Self-Replicate.
CoRR, May, 2026

Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs.
Trans. Mach. Learn. Res., 2026

2025
Shutdown Resistance in Large Language Models.
CoRR, September, 2025

The Singapore Consensus on Global AI Safety Research Priorities.
CoRR, June, 2025

Demonstrating specification gaming in reasoning models.
CoRR, February, 2025

Open Problems in Technical AI Governance.
Trans. Mach. Learn. Res., 2025

2024
Open Problems in Technical AI Governance.
CoRR, 2024

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits.
CoRR, 2024

2023
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B.
CoRR, 2023

LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B.
CoRR, 2023

2022
Constitutional AI: Harmlessness from AI Feedback.
CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.
CoRR, 2022

2016
Hands-on cybersecurity exercises for introductory classes: tutorial presentation.
J. Comput. Sci. Coll., 2016


  Loading...