Thomas Kwa

According to our database1, Thomas Kwa authored at least 7 papers between 2020 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
The Singapore Consensus on Global AI Safety Research Priorities.
CoRR, June, 2025

HCAST: Human-Calibrated Autonomy Software Tasks.
CoRR, March, 2025

Measuring AI Ability to Complete Long Tasks.
CoRR, March, 2025

2024
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Compact Proofs of Model Performance via Mechanistic Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2020
Securing Smart Home Edge Devices against Compromised Cloud Servers.
CoRR, 2020


  Loading...