Xander Davies

According to our database1, Xander Davies authored at least 12 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition.
CoRR, July, 2025

STACK: Adversarial Attacks on LLM Safeguard Pipelines.
CoRR, June, 2025

Existing Large Language Model Unlearning Evaluations Are Inconclusive.
CoRR, June, 2025

An Example Safety Case for Safeguards Against Misuse.
CoRR, May, 2025

Fundamental Limitations in Defending LLM Finetuning APIs.
CoRR, February, 2025

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents.
CoRR, 2024

2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.
Trans. Mach. Learn. Res., 2023

Circuit Breaking: Removing Model Behaviors with Targeted Ablation.
CoRR, 2023

Discovering Variable Binding Circuitry with Desiderata.
CoRR, 2023

Unifying Grokking and Double Descent.
CoRR, 2023

Sparse Distributed Memory is a Continual Learner.
Proceedings of the Eleventh International Conference on Learning Representations, 2023


  Loading...