Thomas Kwa

According to our database¹, Thomas Kwa authored at least 8 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

The Singapore Consensus on Global AI Safety Research Priorities.

[BibT_eX]

[DOI]

Vidhisha Balachandran

Bryan Low Kian Hsiang

CoRR, June, 2025

HCAST: Human-Calibrated Autonomy Software Tasks.

[BibT_eX]

[DOI]

CoRR, March, 2025

Measuring AI Ability to Complete Long Tasks.

[BibT_eX]

[DOI]

CoRR, March, 2025

Measuring AI Ability to Complete Long Software Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

2024

Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification.

[BibT_eX]

[DOI]

Thomas Kwa

Drake Thomas

Adrià Garriga-Alonso

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques.

[BibT_eX]

[DOI]

Rohan Gupta

Iván Arcuschin Moreno

Thomas Kwa

Adrià Garriga-Alonso

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Compact Proofs of Model Performance via Mechanistic Interpretability.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2020

Securing Smart Home Edge Devices against Compromised Cloud Servers.

[BibT_eX]

[DOI]

CoRR, 2020

Thomas Kwa

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...