Scott Emmons

Orcid: 0000-0002-7946-7046

According to our database1, Scott Emmons authored at least 34 papers between 2016 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Exploration Hacking: Can LLMs Learn to Resist RL Training?
CoRR, April, 2026

Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies.
CoRR, January, 2026

2025
Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors.
CoRR, December, 2025

A Pragmatic Way to Measure Chain-of-Thought Monitorability.
CoRR, October, 2025

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.
CoRR, July, 2025

When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors.
CoRR, July, 2025

An Approach to Technical AGI Safety and Security.
CoRR, April, 2025

Observation Interference in Partially Observable Assistance Games.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

The Partially Observable Off-Switch Game.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
The Alignment Problem Under Partial Observability
PhD thesis, 2024

Obfuscated Activations Bypass LLM Latent-Space Defenses.
CoRR, 2024

Will an AI with Private Information Allow Itself to Be Switched Off?
CoRR, 2024

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
CoRR, 2024

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning.
CoRR, 2024

Uncovering Latent Human Wellbeing in Language Model Embeddings.
CoRR, 2024

A StrongREJECT for Empty Jailbreaks.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Image Hijacks: Adversarial Images can Control Generative Models at Runtime.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
ALMANACS: A Simulatability Benchmark for Language Model Explainability.
CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark.
CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.
Proceedings of the International Conference on Machine Learning, 2023

2022
imitation: Clean Imitation Learning Implementations.
CoRR, 2022

An Empirical Investigation of Representation Learning for Imitation.
CoRR, 2022

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria.
Proceedings of the International Conference on Machine Learning, 2022

RvS: What is Essential for Offline RL via Supervised Learning?
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
An Empirical Investigation of Representation Learning for Imitation.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

2020
Sparse Graphical Memory for Robust Planning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2018
Global Redundancy Resolution via Continuous Pseudoinversion of the Forward Kinematic Map.
IEEE Trans Autom. Sci. Eng., 2018

A Map Equation with Metadata: Varying the Role of Attributes in Community Detection.
CoRR, 2018

2017
MOOC visual analytics: Empowering students, teachers, researchers, and platform developers of massively open online courses.
J. Assoc. Inf. Sci. Technol., 2017

Post-Processing Partitions to Identify Domains of Modularity Optimization.
Algorithms, 2017

2016
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.
CoRR, 2016


  Loading...