Aidan Ewart

According to our database1, Aidan Ewart authored at least 7 papers between 2024 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors.
CoRR, February, 2026

2025
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
Trans. Mach. Learn. Res., 2025

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities.
Trans. Mach. Learn. Res., 2025

Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024

Eight Methods to Evaluate Robust Unlearning in LLMs.
CoRR, 2024

Sparse Autoencoders Find Highly Interpretable Features in Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024


  Loading...