Jan Betley
Orcid: 0009-0008-3518-191X
According to our database1,
Jan Betley
authored at least 9 papers
between 2018 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs.
CoRR, August, 2025
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data.
CoRR, July, 2025
CoRR, June, 2025
Proceedings of the Forty-second International Conference on Machine Learning, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
2024
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
2023
2018
Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, 2018