Kai Williams

According to our database¹, Kai Williams authored at least 5 papers between 2024 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Password-Activated Shutdown Protocols for Misaligned Frontier Agents.

[BibT_eX]

[DOI]

Kai Williams

Rohan Subramani

Francis Rhys Ward

CoRR, December, 2025

2024

Representation noising effectively prevents harmful fine-tuning on LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Immunization against harmful fine-tuning attacks.

[BibT_eX]

[DOI]

CoRR, 2024

Representation Noising: A Defence Mechanism Against Harmful Finetuning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Immunization against harmful fine-tuning attacks.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Kai Williams

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...