Jan Betley

Orcid: 0009-0008-3518-191X

According to our database¹, Jan Betley authored at least 9 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs.

[BibT_eX]

[DOI]

CoRR, August, 2025

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data.

[BibT_eX]

[DOI]

CoRR, July, 2025

Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Tell me about yourself: LLMs are aware of their learned behaviors.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

BridgeHand2Vec Bridge Hand Representation.

[BibT_eX]

[DOI]

CoRR, 2023

2018

Predicting winrate of Hearthstone decks using their archetypes.

[BibT_eX]

[DOI]

Anna Sztyber

Jan Betley

Adam Witkowski

Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, 2018

Jan Betley

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...