Tom Everitt

Orcid: 0000-0003-1210-9866

Affiliations:
  • Australian National University


According to our database1, Tom Everitt authored at least 42 papers between 2014 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Robust agents learn causal world models.
CoRR, 2024

The Reasons that Agents Act: Intention and Instrumental Goals.
CoRR, 2024

Discovering Agents (Abstract Reprint).
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Reasoning about Causality in Games (Abstract Reprint).
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Discovering agents.
Artif. Intell., September, 2023

Reasoning about causality in games.
Artif. Intell., July, 2023

Honesty Is the Best Policy: Defining and Mitigating AI Deception.
CoRR, 2023

Characterising Decision Theories with Mechanised Causal Graphs.
CoRR, 2023

Human Control: Definitions and Algorithms.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Honesty Is the Best Policy: Defining and Mitigating AI Deception.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
A Complete Criterion for Value of Information in Soluble Influence Diagrams.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Path-Specific Objectives for Safer Agent Incentives.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective.
Synth., 2021

Shaking the foundations: delusions in sequence models for interaction and control.
CoRR, 2021

Alignment of Language Agents.
CoRR, 2021

PyCID: A Python Library for Causal Influence Diagrams.
Proceedings of the 20th Python in Science Conference 2021 (SciPy 2021), Virtual Conference, July 12, 2021

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice.
Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021

How RL Agents Behave When Their Actions Are Modified.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Agent Incentives: A Causal Perspective.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Avoiding Tampering Incentives in Deep RL via Decoupled Approval.
CoRR, 2020

REALab: An Embedded Perspective on Tampering.
CoRR, 2020

The Incentives that Shape Behaviour.
CoRR, 2020

2019
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective.
CoRR, 2019

Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings.
CoRR, 2019

Modeling AGI Safety Frameworks with Causal Influence Diagrams.
Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, 2019

2018
Scalable agent alignment via reward modeling: a research direction.
CoRR, 2018

AGI Safety Literature Review.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

2017
AI Safety Gridworlds.
CoRR, 2017

Reinforcement Learning with a Corrupted Reward Channel.
CoRR, 2017

Count-Based Exploration in Feature Space for Reinforcement Learning.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Reinforcement Learning with a Corrupted Reward Channel.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

A Game-Theoretic Analysis of the Off-Switch Game.
Proceedings of the Artificial General Intelligence - 10th International Conference, 2017

2016
Death and Suicide in Universal Artificial Intelligence.
Proceedings of the Artificial General Intelligence - 9th International Conference, 2016

Avoiding Wireheading with Value Reinforcement Learning.
Proceedings of the Artificial General Intelligence - 9th International Conference, 2016

Self-Modification of Policy and Utility Function in Rational Agents.
Proceedings of the Artificial General Intelligence - 9th International Conference, 2016

2015
A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem.
CoRR, 2015

Analytical Results on the BFS vs. DFS Algorithm Selection Problem: Part II: Graph Search.
Proceedings of the AI 2015: Advances in Artificial Intelligence, 2015

Analytical Results on the BFS vs. DFS Algorithm Selection Problem. Part I: Tree Search.
Proceedings of the AI 2015: Advances in Artificial Intelligence, 2015

Sequential Extensions of Causal and Evidential Decision Theory.
Proceedings of the Algorithmic Decision Theory - 4th International Conference, 2015

2014
Can we measure the difficulty of an optimization problem?
Proceedings of the 2014 IEEE Information Theory Workshop, 2014

Free Lunch for optimisation under the universal distribution.
Proceedings of the IEEE Congress on Evolutionary Computation, 2014


  Loading...