Nathan Helm-Burger

According to our database1, Nathan Helm-Burger authored at least 6 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring.
CoRR, May, 2025

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

2024
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models.
CoRR, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning.
CoRR, 2024


2023
Will releasing the weights of future large language models grant widespread access to pandemic agents?
CoRR, 2023


  Loading...