Joe Benton

According to our database¹, Joe Benton authored at least 20 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Evaluating Control Protocols for Untrusted AI Agents.

[BibT_eX]

[DOI]

CoRR, November, 2025

Optimizing AI Agent Attacks With Synthetic Data.

[BibT_eX]

[DOI]

CoRR, November, 2025

Inverse Scaling in Test-Time Compute.

[BibT_eX]

[DOI]

Jacob Goldman-Wetzler

CoRR, July, 2025

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.

[BibT_eX]

[DOI]

CoRR, July, 2025

Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents.

[BibT_eX]

[DOI]

CoRR, June, 2025

Reasoning Models Don't Always Say What They Think.

[BibT_eX]

[DOI]

CoRR, May, 2025

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

[BibT_eX]

[DOI]

CoRR, January, 2025

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Error Bounds for Flow Matching Methods.

[BibT_eX]

[DOI]

Joe Benton

George Deligiannidis

Arnaud Doucet

Trans. Mach. Learn. Res., 2024

Sabotage Evaluations for Frontier Models.

[BibT_eX]

[DOI]

CoRR, 2024

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

[BibT_eX]

[DOI]

CoRR, 2024

Many-shot Jailbreaking.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Nearly d-Linear Convergence Bounds for Diffusion Models via Stochastic Localization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Alpha-divergence Variational Inference Meets Importance Weighted Auto-Encoders: Methodology and Asymptotics.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

Measuring Feature Sparsity in Language Models.

[BibT_eX]

[DOI]

Mingyang Deng

Lucas Tao

Joe Benton

CoRR, 2023

Linear Convergence Bounds for Diffusion Models via Stochastic Localization.

[BibT_eX]

[DOI]

CoRR, 2023

2022

From Denoising Diffusions to Denoising Markov Models.

[BibT_eX]

[DOI]

CoRR, 2022

Polysemanticity and Capacity in Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2022

A Continuous Time Framework for Discrete Denoising Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Joe Benton

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...