Owain Evans

According to our database1, Owain Evans authored at least 19 papers between 2009 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Tell, don't show: Declarative facts influence how LLMs generalize.
CoRR, 2023

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions.
CoRR, 2023

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A".
CoRR, 2023

Taken out of context: On measuring situational awareness in LLMs.
CoRR, 2023

2022
Teaching Models to Express Their Uncertainty in Words.
Trans. Mach. Learn. Res., 2022

Forecasting Future World Events With Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

TruthfulQA: Measuring How Models Mimic Human Falsehoods.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Truthful AI: Developing and governing AI that does not lie.
CoRR, 2021

2020
Active Reinforcement Learning: Observing Rewards at a Cost.
CoRR, 2020

2019
Sensory Optimization: Neural Networks as a Model for Understanding and Creating Art.
CoRR, 2019

Generalizing from a few environments in safety-critical reinforcement learning.
CoRR, 2019

2018
Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts.
J. Artif. Intell. Res., 2018

Active Reinforcement Learning with Monte-Carlo Tree Search.
CoRR, 2018

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation.
CoRR, 2018

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention.
Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018

2017
When Will AI Exceed Human Performance? Evidence from AI Experts.
CoRR, 2017

Agent-Agnostic Human-in-the-Loop Reinforcement Learning.
CoRR, 2017

2016
Learning the Preferences of Ignorant, Inconsistent Agents.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2009
Help or Hinder: Bayesian Models of Social Goal Inference.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009


  Loading...