We stand with Ukraine

We stand with Ukraine

David Silver

Orcid: 0000-0002-5197-2892

Affiliations:

Google DeepMind, London, UK
University College London, UK
University of Alberta, Edmonton, Canada (PhD 2009)

According to our database¹, David Silver authored at least 111 papers between 2005 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

On csauthors.net:

Bibliography

2025

DataRater: Meta-Learned Dataset Curation.

[BibT_eX]

[DOI]

,

Gregory Farquhar

,

,

Luisa M. Zintgraf

,

,

,

,

András György

,

,

,

Hado van Hasselt

,

CoRR, May, 2025

2023

Faster sorting algorithms discovered using deep reinforcement learning.

[BibT_eX]

[DOI]

Nat., 2023

2022

Figure Data for the paper "Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning".

[BibT_eX]

[DOI]

Dataset, October, 2022

Deep learning, reinforcement learning, and world models.

[BibT_eX]

[DOI]

,

,

,

,

,

Masashi Sugiyama

,

,

Neural Networks, 2022

Discovering faster matrix multiplication algorithms with reinforcement learning.

[BibT_eX]

[DOI]

Alhussein Fawzi

,

,

,

,

Bernardino Romera-Paredes

,

Mohammadamin Barekatain

,

Alexander Novikov

,

Francisco J. R. Ruiz

,

Julian Schrittwieser

,

Grzegorz Swirszcz

,

,

,

Nat., 2022

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Learning by Directional Gradient Descent.

[BibT_eX]

[DOI]

,

,

,

,

Hado van Hasselt

Proceedings of the Tenth International Conference on Learning Representations, 2022

Bootstrapped Meta-Learning.

[BibT_eX]

[DOI]

Sebastian Flennerhag

,

Yannick Schroecker

,

,

Hado van Hasselt

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

Policy improvement by planning with Gumbel.

[BibT_eX]

[DOI]

,

,

Julian Schrittwieser

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

Planning in Stochastic Environments with a Learned Model.

[BibT_eX]

[DOI]

Ioannis Antonoglou

,

Julian Schrittwieser

,

,

Thomas K. Hubert

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Reward is enough.

[BibT_eX]

[DOI]

,

,

,

Richard S. Sutton

Artif. Intell., 2021

Discovery of Options via Meta-Learned Subgoals.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Hado van Hasselt

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Online and Offline Reinforcement Learning by Planning with a Learned Model.

[BibT_eX]

[DOI]

Julian Schrittwieser

,

,

,

Mohammadamin Barekatain

,

Ioannis Antonoglou

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Proper Value Equivalence.

[BibT_eX]

[DOI]

Christopher Grimm

,

,

Gregory Farquhar

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Self-Consistent Models and Values.

[BibT_eX]

[DOI]

Gregory Farquhar

,

,

,

,

,

Hado Philip van Hasselt

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning and Planning in Complex Action Spaces.

[BibT_eX]

[DOI]

,

Julian Schrittwieser

,

Ioannis Antonoglou

,

Mohammadamin Barekatain

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Muesli: Combining Improvements in Policy Optimization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Theophane Weber

,

,

Hado van Hasselt

Proceedings of the 38th International Conference on Machine Learning, 2021

Expected Eligibility Traces.

[BibT_eX]

[DOI]

Hado van Hasselt

,

Sephora Madjiheurem

,

,

,

,

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

Marc G. Bellemare

,

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Fast reinforcement learning with generalized policy updates.

[BibT_eX]

[DOI]

,

,

,

,

Proc. Natl. Acad. Sci. USA, 2020

Improved protein structure prediction using potentials from deep learning.

[BibT_eX]

[DOI]

Andrew W. Senior

,

,

,

James Kirkpatrick

,

,

,

,

Augustin Zídek

,

Alexander W. R. Nelson

,

,

,

,

,

,

,

,

,

Koray Kavukcuoglu

,

Nat., 2020

Mastering Atari, Go, chess and shogi by planning with a learned model.

[BibT_eX]

[DOI]

Julian Schrittwieser

,

Ioannis Antonoglou

,

,

,

,

,

,

Edward Lockhart

,

,

,

Timothy P. Lillicrap

,

Nat., 2020

Self-Tuning Deep Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

Hado van Hasselt

,

,

CoRR, 2020

A Self-Tuning Actor-Critic Algorithm.

[BibT_eX]

[DOI]

,

,

,

,

,

Hado van Hasselt

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Meta-Gradient Reinforcement Learning with an Objective Discovered Online.

[BibT_eX]

[DOI]

,

Hado Philip van Hasselt

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Discovering Reinforcement Learning Algorithms.

[BibT_eX]

[DOI]

,

,

Wojciech M. Czarnecki

,

,

Hado van Hasselt

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Value-driven Hindsight Modelling.

[BibT_eX]

[DOI]

,

,

Theophane Weber

,

,

Steven Kapturowski

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

The Value Equivalence Principle for Model-Based Reinforcement Learning.

[BibT_eX]

[DOI]

Christopher Grimm

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

What Can Learned Intrinsic Rewards Capture?

[BibT_eX]

[DOI]

,

,

,

,

,

Hado van Hasselt

,

,

Proceedings of the 37th International Conference on Machine Learning, 2020

Behaviour Suite for Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Katrina McKinney

,

,

Csaba Szepesvári

,

,

Benjamin Van Roy

,

Richard S. Sutton

,

,

Hado van Hasselt

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

[BibT_eX]

[DOI]

Nat., 2019

On Inductive Biases in Deep Reinforcement Learning.

[BibT_eX]

[DOI]

,

Hado van Hasselt

,

,

CoRR, 2019

Discovery of Useful Questions as Auxiliary Tasks.

[BibT_eX]

[DOI]

,

,

,

Janarthanan Rajendran

,

Richard L. Lewis

,

,

Hado van Hasselt

,

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

The Option Keyboard: Combining Skills in Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

,

Gheorghe Comanici

,

,

,

,

Jonathan J. Hunt

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

An Investigation of Model-Free Planning.

[BibT_eX]

[DOI]

,

,

,

,

Sébastien Racanière

,

Theophane Weber

,

,

,

,

,

,

,

Timothy P. Lillicrap

Proceedings of the 36th International Conference on Machine Learning, 2019

Universal Successor Features Approximators.

[BibT_eX]

[DOI]

,

,

,

Daniel J. Mankowitz

,

Hado van Hasselt

,

,

,

Proceedings of the 7th International Conference on Learning Representations, 2019

Credit Assignment Techniques in Stochastic Computation Graphs.

[BibT_eX]

[DOI]

Théophane Weber

,

,

,

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Introduction to the special issue on deep reinforcement learning: An editorial.

[BibT_eX]

[DOI]

,

,

,

Guang-Bin Huang

Neural Networks, 2018

Bayesian Optimization in AlphaGo.

[BibT_eX]

[DOI]

,

,

,

Ioannis Antonoglou

,

Julian Schrittwieser

,

,

Nando de Freitas

CoRR, 2018

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning.

[BibT_eX]

[DOI]

,

Wojciech M. Czarnecki

,

,

,

,

Antonio García Castañeda

,

Charles Beattie

,

Neil C. Rabinowitz

,

,

Avraham Ruderman

,

Nicolas Sonnerat

,

,

,

,

,

,

Koray Kavukcuoglu

,

CoRR, 2018

Unsupervised Predictive Memory in a Goal-Directed Agent.

[BibT_eX]

[DOI]

CoRR, 2018

Unicorn: Continual Learning with a Universal, Off-policy Agent.

[BibT_eX]

[DOI]

Daniel J. Mankowitz

,

Augustin Zídek

,

,

,

,

,

,

Hado van Hasselt

,

,

CoRR, 2018

Meta-Gradient Reinforcement Learning.

[BibT_eX]

[DOI]

,

Hado van Hasselt

,

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Learning to Search with MCTSnets.

[BibT_eX]

[DOI]

,

Theophane Weber

,

Ioannis Antonoglou

,

,

,

,

,

Proceedings of the 35th International Conference on Machine Learning, 2018

Implicit Quantile Networks for Distributional Reinforcement Learning.

[BibT_eX]

[DOI]

,

Georg Ostrovski

,

,

Proceedings of the 35th International Conference on Machine Learning, 2018

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Daniel J. Mankowitz

,

Augustin Zídek

,

Proceedings of the 35th International Conference on Machine Learning, 2018

Distributed Prioritized Experience Replay.

[BibT_eX]

[DOI]

,

,

,

Gabriel Barth-Maron

,

,

Hado van Hasselt

,

Proceedings of the 6th International Conference on Learning Representations, 2018

Rainbow: Combining Improvements in Deep Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

Hado van Hasselt

,

,

Georg Ostrovski

,

,

,

,

Mohammad Gheshlaghi Azar

,

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Mastering the game of Go without human knowledge.

[BibT_eX]

[DOI]

,

Julian Schrittwieser

,

,

Ioannis Antonoglou

,

,

,

,

,

,

,

,

Timothy P. Lillicrap

,

,

,

George van den Driessche

,

,

Nat., 2017

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.

[BibT_eX]

[DOI]

,

,

Julian Schrittwieser

,

Ioannis Antonoglou

,

,

,

,

,

Dharshan Kumaran

,

,

Timothy P. Lillicrap

,

,

CoRR, 2017

StarCraft II: A New Challenge for Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2017

Imagination-Augmented Agents for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Theophane Weber

,

Sébastien Racanière

,

David P. Reichert

,

,

,

Danilo Jimenez Rezende

,

Adrià Puigdomènech Badia

,

,

,

,

,

Peter W. Battaglia

,

,

CoRR, 2017

Emergence of Locomotion Behaviours in Rich Environments.

[BibT_eX]

[DOI]

,

,

Srinivasan Sriram

,

,

,

,

,

,

,

S. M. Ali Eslami

,

Martin A. Riedmiller

,

CoRR, 2017

Technical perspective: Solving imperfect information games.

[BibT_eX]

[DOI]

Commun. ACM, 2017

Natural Value Approximators: Learning when to Trust Past Estimates.

[BibT_eX]

[DOI]

,

,

Hado van Hasselt

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Imagination-Augmented Agents for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Sébastien Racanière

,

Theophane Weber

,

David P. Reichert

,

,

,

Danilo Jimenez Rezende

,

Adrià Puigdomènech Badia

,

,

,

,

,

Peter W. Battaglia

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning.

[BibT_eX]

[DOI]

,

Vinícius Flores Zambaldi

,

Audrunas Gruslys

,

Angeliki Lazaridou

,

,

Julien Pérolat

,

,

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Successor Features for Transfer in Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

,

Jonathan J. Hunt

,

,

,

Hado van Hasselt

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

FeUdal Networks for Hierarchical Reinforcement Learning.

[BibT_eX]

[DOI]

Alexander Sasha Vezhnevets

,

,

,

,

,

,

Koray Kavukcuoglu

Proceedings of the 34th International Conference on Machine Learning, 2017

The Predictron: End-To-End Learning and Planning.

[BibT_eX]

[DOI]

,

Hado van Hasselt

,

,

,

,

,

Gabriel Dulac-Arnold

,

David P. Reichert

,

Neil C. Rabinowitz

,

,

Proceedings of the 34th International Conference on Machine Learning, 2017

Decoupled Neural Interfaces using Synthetic Gradients.

[BibT_eX]

[DOI]

,

Wojciech Marian Czarnecki

,

,

,

,

,

Koray Kavukcuoglu

Proceedings of the 34th International Conference on Machine Learning, 2017

Reinforcement Learning with Unsupervised Auxiliary Tasks.

[BibT_eX]

[DOI]

,

,

Wojciech Marian Czarnecki

,

,

,

,

Koray Kavukcuoglu

Proceedings of the 5th International Conference on Learning Representations, 2017

2016

Mastering the game of Go with deep neural networks and tree search.

[BibT_eX]

[DOI]

Nat., 2016

Prioritized Experience Replay.

[BibT_eX]

[DOI]

,

,

Ioannis Antonoglou

,

Proceedings of the 4th International Conference on Learning Representations, 2016

Continuous control with deep reinforcement learning.

[BibT_eX]

[DOI]

Timothy P. Lillicrap

,

Jonathan J. Hunt

,

Alexander Pritzel

,

,

,

,

,

Proceedings of the 4th International Conference on Learning Representations, 2016

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games.

[BibT_eX]

[DOI]

Johannes Heinrich

,

CoRR, 2016

Learning and Transfer of Modulated Locomotor Controllers.

[BibT_eX]

[DOI]

,

,

,

Timothy P. Lillicrap

,

Martin A. Riedmiller

,

CoRR, 2016

Learning functions across many orders of magnitudes.

[BibT_eX]

[DOI]

Hado van Hasselt

,

,

,

CoRR, 2016

Successor Features for Transfer in Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2016

Learning values across many orders of magnitude.

[BibT_eX]

[DOI]

Hado van Hasselt

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Asynchronous Methods for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

,

Adrià Puigdomènech Badia

,

,

,

Timothy P. Lillicrap

,

,

,

Koray Kavukcuoglu

Proceedings of the 33nd International Conference on Machine Learning, 2016

Deep Reinforcement Learning with Double Q-Learning.

[BibT_eX]

[DOI]

Hado van Hasselt

,

,

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

Human-level control through deep reinforcement learning.

[BibT_eX]

[DOI]

,

Koray Kavukcuoglu

,

,

,

,

Marc G. Bellemare

,

,

Martin A. Riedmiller

,

Andreas Fidjeland

,

Georg Ostrovski

,

,

Charles Beattie

,

,

Ioannis Antonoglou

,

,

Dharshan Kumaran

,

,

,

Nat., 2015

Massively Parallel Methods for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

,

Praveen Srinivasan

,

,

,

,

Alessandro De Maria

,

Vedavyas Panneershelvam

,

Mustafa Suleyman

,

Charles Beattie

,

,

,

,

Koray Kavukcuoglu

,

CoRR, 2015

Move Evaluation in Go Using Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

Chris J. Maddison

,

,

,

Proceedings of the 3rd International Conference on Learning Representations, 2015

Memory-based control with recurrent neural networks.

[BibT_eX]

[DOI]

,

Jonathan J. Hunt

,

Timothy P. Lillicrap

,

CoRR, 2015

Value Iteration with Options and State Aggregation.

[BibT_eX]

[DOI]

,

CoRR, 2015

Learning Continuous Control Policies by Stochastic Value Gradients.

[BibT_eX]

[DOI]

,

,

,

Timothy P. Lillicrap

,

,

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Smooth UCT Search in Computer Poker.

[BibT_eX]

[DOI]

Johannes Heinrich

,

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Universal Value Function Approximators.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 32nd International Conference on Machine Learning, 2015

Fictitious Self-Play in Extensive-Form Games.

[BibT_eX]

[DOI]

Johannes Heinrich

,

,

Proceedings of the 32nd International Conference on Machine Learning, 2015

2014

Unit Tests for Stochastic Optimization.

[BibT_eX]

[DOI]

,

Ioannis Antonoglou

,

Proceedings of the 2nd International Conference on Learning Representations, 2014

Better Optimism By Bayes: Adaptive Planning with Rich Models.

[BibT_eX]

[DOI]

,

,

CoRR, 2014

Bayes-Adaptive Simulation-based Search with Value Function Approximation.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Deterministic Policy Gradient Algorithms.

[BibT_eX]

[DOI]

,

,

,

,

,

Martin A. Riedmiller

Proceedings of the 31th International Conference on Machine Learning, 2014

2013

Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search.

[BibT_eX]

[DOI]

,

,

J. Artif. Intell. Res., 2013

Playing Atari with Deep Reinforcement Learning.

[BibT_eX]

[DOI]

,

Koray Kavukcuoglu

,

,

,

Ioannis Antonoglou

,

,

Martin A. Riedmiller

CoRR, 2013

Concurrent Reinforcement Learning from Customer Interactions.

[BibT_eX]

[DOI]

,

Leonard Newnham

,

,

,

Proceedings of the 30th International Conference on Machine Learning, 2013

Temporal-Difference Search in Computer Go.

[BibT_eX]

[DOI]

,

Richard S. Sutton

,

Proceedings of the Twenty-Third International Conference on Automated Planning and Scheduling, 2013

2012

The grand challenge of computer Go: Monte Carlo tree search and extensions.

[BibT_eX]

[DOI]

,

,

Marc Schoenauer

,

,

,

Csaba Szepesvári

,

Olivier Teytaud

Commun. ACM, 2012

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Compositional Planning Using Optimal Option Models.

[BibT_eX]

[DOI]

,

Proceedings of the 29th International Conference on Machine Learning, 2012

Gradient Temporal Difference Networks.

[BibT_eX]

[DOI]

Proceedings of the Tenth European Workshop on Reinforcement Learning, 2012

Actor-Critic Reinforcement Learning with Energy-Based Policies.

[BibT_eX]

[DOI]

,

,

Proceedings of the Tenth European Workshop on Reinforcement Learning, 2012

2011

A Monte-Carlo AIXI Approximation.

[BibT_eX]

[DOI]

,

,

,

William T. B. Uther

,

J. Artif. Intell. Res., 2011

Monte-Carlo tree search and rapid action value estimation in computer Go.

[BibT_eX]

[DOI]

,

Artif. Intell., 2011

Non-Linear Monte-Carlo Search in Civilization II.

[BibT_eX]

[DOI]

S. R. K. Branavan

,

,

Regina Barzilay

Proceedings of the IJCAI 2011, 2011

Learning to Win by Reading Manuals in a Monte-Carlo Framework.

[BibT_eX]

[DOI]

S. R. K. Branavan

,

,

Regina Barzilay

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011

2010

Monte-Carlo Planning in Large POMDPs.

[BibT_eX]

[DOI]

,

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Reinforcement Learning via AIXI Approximation.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010

2009

A Monte Carlo AIXI Approximation

[BibT_eX]

[DOI]

,

,

,

CoRR, 2009

Bootstrapping from Game Tree Search.

[BibT_eX]

[DOI]

,

,

William T. B. Uther

,

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.

[BibT_eX]

[DOI]

Hamid Reza Maei

,

Csaba Szepesvári

,

Shalabh Bhatnagar

,

,

,

Richard S. Sutton

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Fast gradient-descent methods for temporal-difference learning with linear function approximation.

[BibT_eX]

[DOI]

Richard S. Sutton

,

Hamid Reza Maei

,

,

Shalabh Bhatnagar

,

,

Csaba Szepesvári

,

Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Monte-Carlo simulation balancing.

[BibT_eX]

[DOI]

,

Proceedings of the 26th Annual International Conference on Machine Learning, 2009

2008

Sample-based learning and search with permanent and transient memories.

[BibT_eX]

[DOI]

,

Richard S. Sutton

,

Proceedings of the Machine Learning, 2008

Achieving Master Level Play in 9 x 9 Computer Go.

[BibT_eX]

[DOI]

,

Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 2008

2007

Reinforcement Learning of Local Shape in the Game of Go.

[BibT_eX]

[DOI]

,

Richard S. Sutton

,

Proceedings of the IJCAI 2007, 2007

On the role of tracking in stationary environments.

[BibT_eX]

[DOI]

Richard S. Sutton

,

,

Proceedings of the Machine Learning, 2007

Combining online and offline knowledge in UCT.

[BibT_eX]

[DOI]

,

Proceedings of the Machine Learning, 2007

2005

Cooperative Pathfinding.

[BibT_eX]

Proceedings of the First Artificial Intelligence and Interactive Digital Entertainment Conference, 2005

Loading...