Jeffrey Ladish

According to our database1, Jeffrey Ladish authored at least 10 papers between 2016 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
The Singapore Consensus on Global AI Safety Research Priorities.
CoRR, June, 2025

Demonstrating specification gaming in reasoning models.
CoRR, February, 2025

Open Problems in Technical AI Governance.
Trans. Mach. Learn. Res., 2025

2024
Open Problems in Technical AI Governance.
CoRR, 2024

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits.
CoRR, 2024

2023
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B.
CoRR, 2023

LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B.
CoRR, 2023

2022
Constitutional AI: Harmlessness from AI Feedback.
CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.
CoRR, 2022

2016
Hands-on cybersecurity exercises for introductory classes: tutorial presentation.
J. Comput. Sci. Coll., 2016


  Loading...