Erik Jenner

According to our database¹, Erik Jenner authored at least 18 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Frontier Models Can Take Actions at Low Probabilities.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability.

[BibT_eX]

[DOI]

CoRR, October, 2025

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.

[BibT_eX]

[DOI]

CoRR, July, 2025

When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors.

[BibT_eX]

[DOI]

Senthooran Rajamanoharan

Heng Chen

Irhum Shafkat

Rohin Shah

CoRR, July, 2025

RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?

[BibT_eX]

[DOI]

Rohan Gupta

Erik Jenner

CoRR, June, 2025

Diffusion On Syntax Trees For Program Synthesis.

[BibT_eX]

[DOI]

Shreyas Kapur

Erik Jenner

Stuart Russell

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Foundational Challenges in Assuring Alignment and Safety of Large Language Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Obfuscated Activations Bypass LLM Latent-Space Defenses.

[BibT_eX]

[DOI]

CoRR, 2024

Foundational Challenges in Assuring Alignment and Safety of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning.

[BibT_eX]

[DOI]

CoRR, 2024

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

STARC: A General Framework For Quantifying Differences Between Reward Functions.

[BibT_eX]

[DOI]

Joar Max Viktor Skalse

Lucy Farnik

Sumeet Ramesh Motwani

Erik Jenner

Adam Gleave

Alessandro Abate

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2022

imitation: Clean Imitation Learning Implementations.

[BibT_eX]

[DOI]

CoRR, 2022

Calculus on MDPs: Potential Shaping as a Gradient.

[BibT_eX]

[DOI]

Erik Jenner

Herke van Hoof

Adam Gleave

CoRR, 2022

Preprocessing Reward Functions for Interpretability.

[BibT_eX]

[DOI]

Erik Jenner

Adam Gleave

CoRR, 2022

Steerable Partial Differential Operators for Equivariant Neural Networks.

[BibT_eX]

[DOI]

Erik Jenner

Maurice Weiler

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice.

[BibT_eX]

[DOI]

Erik Jenner

Enrique Fita Sanmartín

Fred A. Hamprecht

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Erik Jenner

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...