Joe Benton

According to our database1, Joe Benton authored at least 18 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Inverse Scaling in Test-Time Compute.
CoRR, July, 2025

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.
CoRR, July, 2025

Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning.
CoRR, June, 2025

SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents.
CoRR, June, 2025

Reasoning Models Don't Always Say What They Think.
CoRR, May, 2025

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.
CoRR, January, 2025

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Error Bounds for Flow Matching Methods.
Trans. Mach. Learn. Res., 2024

Sabotage Evaluations for Frontier Models.
CoRR, 2024

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
CoRR, 2024


Nearly d-Linear Convergence Bounds for Diffusion Models via Stochastic Localization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Alpha-divergence Variational Inference Meets Importance Weighted Auto-Encoders: Methodology and Asymptotics.
J. Mach. Learn. Res., 2023

Measuring Feature Sparsity in Language Models.
CoRR, 2023

Linear Convergence Bounds for Diffusion Models via Stochastic Localization.
CoRR, 2023

2022
From Denoising Diffusions to Denoising Markov Models.
CoRR, 2022

Polysemanticity and Capacity in Neural Networks.
CoRR, 2022

A Continuous Time Framework for Discrete Denoising Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022


  Loading...