Scott Emmons

Orcid: 0000-0002-7946-7046

According to our database¹, Scott Emmons authored at least 34 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Exploration Hacking: Can LLMs Learn to Resist RL Training?

[BibT_eX]

[DOI]

CoRR, April, 2026

Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors.

[BibT_eX]

[DOI]

CoRR, December, 2025

A Pragmatic Way to Measure Chain-of-Thought Monitorability.

[BibT_eX]

[DOI]

CoRR, October, 2025

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.

[BibT_eX]

[DOI]

CoRR, July, 2025

When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors.

[BibT_eX]

[DOI]

Senthooran Rajamanoharan

Heng Chen

Irhum Shafkat

Rohin Shah

CoRR, July, 2025

An Approach to Technical AGI Safety and Security.

[BibT_eX]

[DOI]

CoRR, April, 2025

Observation Interference in Partially Observable Assistance Games.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

The Partially Observable Off-Switch Game.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

The Alignment Problem Under Partial Observability

[BibT_eX]

[DOI]

Scott Emmons

PhD thesis, 2024

Obfuscated Activations Bypass LLM Latent-Space Defenses.

[BibT_eX]

[DOI]

CoRR, 2024

Will an AI with Private Information Allow Itself to Be Switched Off?

[BibT_eX]

[DOI]

CoRR, 2024

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

[BibT_eX]

[DOI]

CoRR, 2024

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Uncovering Latent Human Wellbeing in Language Model Embeddings.

[BibT_eX]

[DOI]

CoRR, 2024

A StrongREJECT for Empty Jailbreaks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Image Hijacks: Adversarial Images can Control Generative Models at Runtime.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

ALMANACS: A Simulatability Benchmark for Language Model Explainability.

[BibT_eX]

[DOI]

CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

imitation: Clean Imitation Learning Implementations.

[BibT_eX]

[DOI]

CoRR, 2022

An Empirical Investigation of Representation Learning for Imitation.

[BibT_eX]

[DOI]

CoRR, 2022

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

RvS: What is Essential for Offline RL via Supervised Learning?

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

An Empirical Investigation of Representation Learning for Imitation.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

2020

Sparse Graphical Memory for Robust Planning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2018

Global Redundancy Resolution via Continuous Pseudoinversion of the Forward Kinematic Map.

[BibT_eX]

[DOI]

Kris Hauser

Scott Emmons

IEEE Trans Autom. Sci. Eng., 2018

A Map Equation with Metadata: Varying the Role of Attributes in Community Detection.

[BibT_eX]

[DOI]

Scott Emmons

Peter J. Mucha

CoRR, 2018

2017

MOOC visual analytics: Empowering students, teachers, researchers, and platform developers of massively open online courses.

[BibT_eX]

[DOI]

Scott Emmons

Robert P. Light

Katy Börner

J. Assoc. Inf. Sci. Technol., 2017

Post-Processing Partitions to Identify Domains of Modularity Optimization.

[BibT_eX]

[DOI]

Algorithms, 2017

2016

Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

[BibT_eX]

[DOI]

CoRR, 2016

Scott Emmons

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...