Abhay Sheshadri

According to our database1, Abhay Sheshadri authored at least 6 papers between 2024 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Why Do Some Language Models Fake Alignment While Others Don't?
CoRR, June, 2025

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
Trans. Mach. Learn. Res., 2025

2024
Obfuscated Activations Bypass LLM Latent-Space Defenses.
CoRR, 2024

Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization.
CoRR, 2024

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task.
Proceedings of the Findings of the Association for Computational Linguistics, 2024


  Loading...