Alexandra Souly

According to our database1, Alexandra Souly authored at least 17 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Evaluating whether AI models would sabotage AI safety research.
CoRR, April, 2026

Seven simple steps for log analysis in AI systems.
CoRR, April, 2026

UK AISI Alignment Evaluation Case-Study.
CoRR, April, 2026

When Do LLM Preferences Predict Downstream Behavior?
CoRR, February, 2026

2025
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents.
CoRR, October, 2025

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples.
CoRR, October, 2025

Fundamental Limitations in Defending LLM Finetuning APIs.
CoRR, February, 2025

Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents.
CoRR, 2024

A StrongREJECT for Empty Jailbreaks.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

JaxMARL: Multi-Agent RL Environments and Algorithms in JAX.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024


2023
Leading the Pack: N-player Opponent Shaping.
CoRR, 2023

JaxMARL: Multi-Agent RL Environments in JAX.
CoRR, 2023

2022
Retrospective on the 2021 BASALT Competition on Learning from Human Feedback.
CoRR, 2022

2021
Retrospective on the 2021 MineRL BASALT Competition on Learning from Human Feedback.
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021


  Loading...