Erik Jenner

According to our database1, Erik Jenner authored at least 18 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Frontier Models Can Take Actions at Low Probabilities.
CoRR, March, 2026

2025
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability.
CoRR, October, 2025

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.
CoRR, July, 2025

When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors.
CoRR, July, 2025

RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?
CoRR, June, 2025

Diffusion On Syntax Trees For Program Synthesis.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Foundational Challenges in Assuring Alignment and Safety of Large Language Models.
Trans. Mach. Learn. Res., 2024

Obfuscated Activations Bypass LLM Latent-Space Defenses.
CoRR, 2024

Foundational Challenges in Assuring Alignment and Safety of Large Language Models.
CoRR, 2024

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning.
CoRR, 2024

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

STARC: A General Framework For Quantifying Differences Between Reward Functions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2022
imitation: Clean Imitation Learning Implementations.
CoRR, 2022

Calculus on MDPs: Potential Shaping as a Gradient.
CoRR, 2022

Preprocessing Reward Functions for Interpretability.
CoRR, 2022

Steerable Partial Differential Operators for Equivariant Neural Networks.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021


  Loading...