Sophie Xhonneux

According to our database1, Sophie Xhonneux authored at least 8 papers between 2024 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs.
CoRR, July, 2025

LLM-Safety Evaluations Lack Robustness.
CoRR, March, 2025

A generative approach to LLM harmfulness detection with special red flag tokens.
CoRR, February, 2025

Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives.
CoRR, February, 2025

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
In-Context Learning Can Re-learn Forbidden Tasks.
CoRR, 2024

Efficient Adversarial Training in LLMs with Continuous Attacks.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024


  Loading...