Scott Emmons

Orcid: 0000-0002-7946-7046

According to our database1, Scott Emmons authored at least 30 papers between 2016 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.
CoRR, July, 2025

When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors.
CoRR, July, 2025

An Approach to Technical AGI Safety and Security.
CoRR, April, 2025

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

The Partially Observable Off-Switch Game.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
The Alignment Problem Under Partial Observability
PhD thesis, 2024

Observation Interference in Partially Observable Assistance Games.
CoRR, 2024

Obfuscated Activations Bypass LLM Latent-Space Defenses.
CoRR, 2024

Will an AI with Private Information Allow Itself to Be Switched Off?
CoRR, 2024

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
CoRR, 2024

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning.
CoRR, 2024

Uncovering Latent Human Wellbeing in Language Model Embeddings.
CoRR, 2024

A StrongREJECT for Empty Jailbreaks.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Image Hijacks: Adversarial Images can Control Generative Models at Runtime.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
ALMANACS: A Simulatability Benchmark for Language Model Explainability.
CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark.
CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.
Proceedings of the International Conference on Machine Learning, 2023

2022
imitation: Clean Imitation Learning Implementations.
CoRR, 2022

An Empirical Investigation of Representation Learning for Imitation.
CoRR, 2022

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria.
Proceedings of the International Conference on Machine Learning, 2022

RvS: What is Essential for Offline RL via Supervised Learning?
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
An Empirical Investigation of Representation Learning for Imitation.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

2020
Sparse Graphical Memory for Robust Planning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2018
Global Redundancy Resolution via Continuous Pseudoinversion of the Forward Kinematic Map.
IEEE Trans Autom. Sci. Eng., 2018

A Map Equation with Metadata: Varying the Role of Attributes in Community Detection.
CoRR, 2018

2017
MOOC visual analytics: Empowering students, teachers, researchers, and platform developers of massively open online courses.
J. Assoc. Inf. Sci. Technol., 2017

Post-Processing Partitions to Identify Domains of Modularity Optimization.
Algorithms, 2017

2016
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.
CoRR, 2016


  Loading...