Johannes Treutlein

According to our database1, Johannes Treutlein authored at least 12 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs.
CoRR, August, 2025

Auditing language models for hidden objectives.
CoRR, March, 2025

2024
Alignment faking in large language models.
CoRR, 2024

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Conditioning Predictive Models: Risks and Strategies.
CoRR, 2023

Incentivizing honest performative predictions with proper scoring rules.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Similarity-based cooperative equilibrium.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Similarity-based Cooperation.
CoRR, 2022

Path Independent Equilibrium Models Can Better Exploit Test-Time Computation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

COLA: Consistent Learning with Opponent-Learning Awareness.
Proceedings of the International Conference on Machine Learning, 2022

2021
Normative Disagreement as a Challenge for Cooperative AI.
CoRR, 2021

A New Formalism, Method and Open Issues for Zero-Shot Coordination.
Proceedings of the 38th International Conference on Machine Learning, 2021


  Loading...