Thomas Kwa

According to our database1, Thomas Kwa authored at least 8 papers between 2020 and 2025.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
The Singapore Consensus on Global AI Safety Research Priorities.
CoRR, June, 2025

HCAST: Human-Calibrated Autonomy Software Tasks.
CoRR, March, 2025

Measuring AI Ability to Complete Long Tasks.
CoRR, March, 2025

Measuring AI Ability to Complete Long Software Tasks.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

2024
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Compact Proofs of Model Performance via Mechanistic Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2020
Securing Smart Home Edge Devices against Compromised Cloud Servers.
CoRR, 2020


  Loading...