Jan Betley
Orcid: 0009-0008-3518-191X
According to our database1,
Jan Betley authored at least 13 papers
between 2018 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2026
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers.
CoRR, April, 2026
The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious.
CoRR, April, 2026
Nat., 2026
2025
CoRR, December, 2025
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs.
CoRR, August, 2025
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data.
CoRR, July, 2025
CoRR, June, 2025
Proceedings of the Forty-second International Conference on Machine Learning, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
2024
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024
2023
2018
Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, 2018