Adam Gleave

Orcid: 0000-0002-3467-528X

According to our database¹, Adam Gleave authored at least 38 papers between 2016 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility.

[BibT_eX]

[DOI]

Brendan Murphy

Dillon Bowen

Shahrad Mohammadzadeh

Julius Broomfield

Adam Gleave

Kellin Pelrine

CoRR, July, 2025

The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models.

[BibT_eX]

[DOI]

Ann-Kathrin Dombrowski

Dillon Bowen

Adam Gleave

Chris Cundy

CoRR, July, 2025

STACK: Adversarial Attacks on LLM Safeguard Pipelines.

[BibT_eX]

[DOI]

Ian R. McKenzie

Oskar John Hollinsworth

CoRR, June, 2025

The Singapore Consensus on Global AI Safety Research Priorities.

[BibT_eX]

[DOI]

Vidhisha Balachandran

Bryan Low Kian Hsiang

CoRR, June, 2025

Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban.

[BibT_eX]

[DOI]

CoRR, June, 2025

It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics.

[BibT_eX]

[DOI]

Matthew Kowal

Jasper Timm

Jean-François Godbout

CoRR, June, 2025

Preference Learning with Lie Detectors can Induce Honesty or Evasion.

[BibT_eX]

[DOI]

Chris Cundy

Adam Gleave

CoRR, May, 2025

AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations.

[BibT_eX]

[DOI]

Dillon Bowen

Ann-Kathrin Dombrowski

Adam Gleave

Chris Cundy

CoRR, March, 2025

Multi-Agent Risks from Advanced AI.

[BibT_eX]

[DOI]

CoRR, February, 2025

Can Go AIs Be Adversarially Robust?

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Scaling Trends for Data Poisoning in LLMs.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Scaling Laws for Data Poisoning in LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring Scaling Trends in LLM Robustness.

[BibT_eX]

[DOI]

Nikolaus H. R. Howe

Michal Zajac

Ian R. McKenzie

Oskar John Hollinsworth

Tom Tseng

Pierre-Luc Bacon

Adam Gleave

CoRR, 2024

Planning behavior in a recurrent neural network that plays Sokoban.

[BibT_eX]

[DOI]

Adrià Garriga-Alonso

Mohammad Taufeeque

Adam Gleave

CoRR, 2024

Uncovering Latent Human Wellbeing in Language Model Embeddings.

[BibT_eX]

[DOI]

CoRR, 2024

STARC: A General Framework For Quantifying Differences Between Reward Functions.

[BibT_eX]

[DOI]

Joar Max Viktor Skalse

Lucy Farnik

Sumeet Ramesh Motwani

Erik Jenner

Adam Gleave

Alessandro Abate

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Exploiting Novel GPT-4 APIs.

[BibT_eX]

[DOI]

CoRR, 2023

On The Fragility of Learned Reward Functions.

[BibT_eX]

[DOI]

CoRR, 2023

Adversarial Policies Beat Superhuman Go AIs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning.

[BibT_eX]

[DOI]

Joar Max Viktor Skalse

Matthew Farrugia-Roberts

Stuart Russell

Alessandro Abate

Adam Gleave

Proceedings of the International Conference on Machine Learning, 2023

2022

Towards Trustworthy Machine Learning

[BibT_eX]

[DOI]

Adam Gleave

PhD thesis, 2022

imitation: Clean Imitation Learning Implementations.

[BibT_eX]

[DOI]

CoRR, 2022

Adversarial Policies Beat Professional-Level Go AIs.

[BibT_eX]

[DOI]

CoRR, 2022

Calculus on MDPs: Potential Shaping as a Gradient.

[BibT_eX]

[DOI]

Erik Jenner

Herke van Hoof

Adam Gleave

CoRR, 2022

Reducing Exploitability with Population Based Training.

[BibT_eX]

[DOI]

Pavel Czempin

Adam Gleave

CoRR, 2022

Preprocessing Reward Functions for Interpretability.

[BibT_eX]

[DOI]

Erik Jenner

Adam Gleave

CoRR, 2022

A Primer on Maximum Causal Entropy Inverse Reinforcement Learning.

[BibT_eX]

[DOI]

Adam Gleave

Sam Toyer

CoRR, 2022

Uncertainty Estimation for Language Reward Models.

[BibT_eX]

[DOI]

Adam Gleave

Geoffrey Irving

CoRR, 2022

2021

Stable-Baselines3: Reliable Reinforcement Learning Implementations.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2021

Quantifying Differences in Reward Functions.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

Understanding Learned Reward Functions.

[BibT_eX]

[DOI]

Eric J. Michaud

Adam Gleave

Stuart Russell

CoRR, 2020

DERAIL: Diagnostic Environments for Reward And Imitation Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Adversarial Policies: Attacking Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

2018

Inverse reinforcement learning for video games.

[BibT_eX]

[DOI]

Aaron Tucker

Adam Gleave

Stuart Russell

CoRR, 2018

Active Inverse Reward Design.

[BibT_eX]

[DOI]

Sören Mindermann

Rohin Shah

Adam Gleave

Dylan Hadfield-Menell

CoRR, 2018

Multi-task Maximum Entropy Inverse Reinforcement Learning.

[BibT_eX]

[DOI]

Adam Gleave

Oliver Habryka

CoRR, 2018

2017

Making Compression Algorithms for Unicode Text.

[BibT_eX]

[DOI]

Adam Gleave

Christian Steinruecken

Proceedings of the 2017 Data Compression Conference, 2017

2016

Firmament: Fast, Centralized Cluster Scheduling at Scale.

[BibT_eX]

[DOI]

Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016

Adam Gleave

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...