Aidan Ewart

According to our database1, Aidan Ewart authored at least 6 papers between 2024 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
Trans. Mach. Learn. Res., 2025

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities.
Trans. Mach. Learn. Res., 2025

2024
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization.
CoRR, 2024

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024

Eight Methods to Evaluate Robust Unlearning in LLMs.
CoRR, 2024

Sparse Autoencoders Find Highly Interpretable Features in Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024


  Loading...