Hoagy Cunningham

According to our database1, Hoagy Cunningham authored at least 5 papers between 2024 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Segment-Level Coherence for Robust Harmful Intent Probing in LLMs.
CoRR, April, 2026

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks.
CoRR, January, 2026

2025
Auditing language models for hidden objectives.
CoRR, March, 2025

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.
CoRR, January, 2025

2024
Sparse Autoencoders Find Highly Interpretable Features in Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024


  Loading...