Jan Betley

Orcid: 0009-0008-3518-191X

According to our database1, Jan Betley authored at least 13 papers between 2018 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers.
CoRR, April, 2026

The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious.
CoRR, April, 2026

Training large language models on narrow tasks can lead to broad misalignment.
Nat., 2026

2025
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs.
CoRR, December, 2025

School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs.
CoRR, August, 2025

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data.
CoRR, July, 2025

Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models.
CoRR, June, 2025

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Tell me about yourself: LLMs are aware of their learned behaviors.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
BridgeHand2Vec Bridge Hand Representation.
CoRR, 2023

2018
Predicting winrate of Hearthstone decks using their archetypes.
Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, 2018


  Loading...