Zhaohan Guo

Yunhao Tang

Daniel Guo

Daniele Calandriello

CoRR, 2024

Understanding the performance gap between online and offline alignment algorithms.

[BibT_eX]

[DOI]

CoRR, 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment.

[BibT_eX]

[DOI]

Michal Valko

Bernardo Ávila Pires

Bilal Piot

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Nash Learning from Human Feedback.

[BibT_eX]

[DOI]

Rémi Munos

Michal Valko

Daniele Calandriello

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Human Alignment of Large Language Models through Online Preference Optimisation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

A General Theoretical Paradigm to Understand Learning from Human Preferences.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

Nash Learning from Human Feedback.

[BibT_eX]

[DOI]

Rémi Munos

Michal Valko

Daniele Calandriello

CoRR, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.

[BibT_eX]

[DOI]

Yunhao Tang

Proceedings of the International Conference on Machine Learning, 2023

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

BYOL-Explore: Exploration by Bootstrapped Prediction.

[BibT_eX]

[DOI]

Bilal Piot

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Geometric Entropic Exploration.

[BibT_eX]

[DOI]

CoRR, 2021

2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Agent57: Outperforming the Atari Human Benchmark.

[BibT_eX]

[DOI]

Adrià Puigdomènech Badia

Proceedings of the 37th International Conference on Machine Learning, 2020

Never Give Up: Learning Directed Exploration Strategies.

[BibT_eX]

[DOI]

Adrià Puigdomènech Badia

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Directed Exploration for Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2019

2018

Neural Predictive Belief Representations.

[BibT_eX]

[DOI]

CoRR, 2018

2017

Using Options for Long-Horizon Off-Policy Evaluation.

[BibT_eX]

[DOI]

Philip S. Thomas

CoRR, 2017

Sample Efficient Feature Selection for Factored MDPs.

[BibT_eX]

[DOI]

CoRR, 2017

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation.

[BibT_eX]

[DOI]

Zhaohan Guo

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

2016

PAC Continuous State Online Multitask Reinforcement Learning with Identification.

[BibT_eX]

[DOI]

Yao Liu

Zhaohan Guo

Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, 2016

A PAC RL Algorithm for Episodic POMDPs.

[BibT_eX]

[DOI]

Shayan Doroudi

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015

Concurrent PAC RL.

[BibT_eX]

[DOI]

Zhaohan Guo