Vivek Hebbar

According to our database1, Vivek Hebbar authored at least 5 papers between 2024 and 2026.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Removing Sandbagging in LLMs by Training with Weak Supervision.
CoRR, April, 2026

Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases.
CoRR, April, 2026

2025
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
Trans. Mach. Learn. Res., 2025

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024


  Loading...