Alexander Matt Turner

Affiliations:
  • Oregon State University, Corvallis, OR, USA


According to our database1, Alexander Matt Turner authored at least 15 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Consistency Training Helps Stop Sycophancy and Jailbreaks.
CoRR, October, 2025

Distillation Robustifies Unlearning.
CoRR, June, 2025

An Approach to Technical AGI Safety and Security.
CoRR, April, 2025

2024
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks.
CoRR, 2024

Steering Llama 2 via Contrastive Activation Addition.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Understanding and Controlling a Maze-Solving Policy Network.
CoRR, 2023

Activation Addition: Steering Language Models Without Optimization.
CoRR, 2023

2022
On Avoiding Power-Seeking by Artificial Intelligence.
CoRR, 2022

Formalizing the Problem of Side Effect Regularization.
CoRR, 2022

Parametrically Retargetable Decision-Makers Tend To Seek Power.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
Optimal Policies Tend To Seek Power.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020
Avoiding Side Effects in Complex Environments.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Conservative Agency via Attainable Utility Preservation.
Proceedings of the AIES '20: AAAI/ACM Conference on AI, 2020

2019
Optimal Farsighted Agents Tend to Seek Power.
CoRR, 2019

Conservative Agency.
Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, 2019


  Loading...