Henry Sleight
According to our database1,
Henry Sleight
authored at least 22 papers
between 2024 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language.
CoRR, October, 2025
CoRR, October, 2025
Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment.
CoRR, October, 2025
CoRR, September, 2025
CoRR, July, 2025
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
Trans. Mach. Learn. Res., 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
2024
Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach.
CoRR, 2024
CoRR, 2024
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024
CoRR, 2024
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data.
CoRR, 2024