Bernardo Ávila Pires

Yunhao Tang

Daniel Guo

Daniele Calandriello

CoRR, 2024

Understanding the performance gap between online and offline alignment algorithms.

[BibT_eX]

[DOI]

CoRR, 2024

Off-policy Distributional Q(λ): Distributional RL without Importance Sampling.

[BibT_eX]

[DOI]

CoRR, 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment.

[BibT_eX]

[DOI]

Michal Valko

Bilal Piot

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Human Alignment of Large Language Models through Online Preference Optimisation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Hierarchical Reinforcement Learning in Complex 3D Environments.

[BibT_eX]

[DOI]

Feryal M. P. Behbahani

CoRR, 2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.

[BibT_eX]

[DOI]

Yunhao Tang

Zhaohan Daniel Guo

Proceedings of the International Conference on Machine Learning, 2023

Understanding Plasticity in Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

BYOL-Explore: Exploration by Bootstrapped Prediction.

[BibT_eX]

[DOI]

Bilal Piot

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Neural Recursive Belief States in Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Geometric Entropic Exploration.

[BibT_eX]

[DOI]

Zhaohan Daniel Guo

CoRR, 2021

2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

2019

World Discovery Models.

[BibT_eX]

[DOI]

CoRR, 2019

2018

Neural Predictive Belief Representations.

[BibT_eX]

[DOI]

Zhaohan Daniel Guo

CoRR, 2018

2016

Multiclass Classification Calibration Functions.

[BibT_eX]

[DOI]

CoRR, 2016

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models.

[BibT_eX]

[DOI]

CoRR, 2016

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models.

[BibT_eX]

[DOI]

Proceedings of the 29th Conference on Learning Theory, 2016

2015

Pathological Effects of Variance on Classification-Based Policy Iteration.

[BibT_eX]

[DOI]

Proceedings of the Learning for General Competency in Video Games, 2015

2014

Pseudo-MDPs and factored linear action models.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2014

2013

Cost-sensitive Multiclass Classification Risk Bounds.

[BibT_eX]

[DOI]

Mohammad Ghavamzadeh

Proceedings of the 30th International Conference on Machine Learning, 2013

2012

Statistical linear estimation with penalized estimators: an application to reinforcement learning.

[BibT_eX]

[DOI]