Yunhao Tang

CoRR, 2024

A Distributional Analogue to the Successor Representation.

[BibT_eX]

[DOI]

CoRR, 2024

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model.

[BibT_eX]

[DOI]

CoRR, 2024

Off-policy Distributional Q(λ): Distributional RL without Importance Sampling.

[BibT_eX]

[DOI]

CoRR, 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment.

[BibT_eX]

[DOI]

Michal Valko

Bernardo Ávila Pires

Bilal Piot

CoRR, 2024

Learning Uncertainty-Aware Temporally-Extended Actions.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Nash Learning from Human Feedback.

[BibT_eX]

[DOI]

Michal Valko

Daniele Calandriello

CoRR, 2023

An Analysis of Quantile Temporal-Difference Learning.

[BibT_eX]

[DOI]

Mark Rowland

CoRR, 2023

Fast Rates for Maximum Entropy Exploration.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

VA-learning as a more efficient alternative to Q-learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Towards a better understanding of representation dynamics under TD-learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.

[BibT_eX]

[DOI]

Zhaohan Daniel Guo

Proceedings of the International Conference on Machine Learning, 2023

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Quantile Credit Assignment.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal.

[BibT_eX]

[DOI]

CoRR, 2022

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

BYOL-Explore: Exploration by Bootstrapped Prediction.

[BibT_eX]

[DOI]

Bilal Piot

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Marginalized Operators for Off-policy Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Reinforcement Learning: New Algorithms and An Application for Integer Programming.

[BibT_eX]

[DOI]

PhD thesis, 2021

Unlocking Pixels for Reinforcement Learning via Implicit Attention.

[BibT_eX]

[DOI]

Deepali Jain

Valerii Likhosherstov

CoRR, 2021

ES-ENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Taylor Expansion of Discount Factors.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Revisiting Peng's Q(λ) for Modern Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Guiding Evolutionary Strategies with Off-Policy Actor-Critic.

[BibT_eX]

[DOI]

Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning.

[BibT_eX]

[DOI]

Alp Kucukelbir

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies.

[BibT_eX]

[DOI]

CoRR, 2020

Discrete Action On-Policy Learning with Action-Value Critic.

[BibT_eX]

[DOI]

CoRR, 2020

Self-Imitation Learning via Generalized Lower Bound Q-learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Taylor Expansion Policy Optimization.

[BibT_eX]

[DOI]

Michal Valko

Proceedings of the 37th International Conference on Machine Learning, 2020

Reinforcement Learning for Integer Programming: Learning to Cut.

[BibT_eX]

[DOI]

Yuri Faenza

Proceedings of the 37th International Conference on Machine Learning, 2020

Learning to Score Behaviors for Guided Policy Optimization.

[BibT_eX]

[DOI]

Anna Choromanska

Michael I. Jordan

Proceedings of the 37th International Conference on Machine Learning, 2020

Monte-Carlo Tree Search as Regularized Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

ES-MAML: Simple Hessian-Free Meta Learning.

[BibT_eX]

[DOI]

Wenbo Gao

Yuxiang Yang

Proceedings of the 8th International Conference on Learning Representations, 2020

Discrete Action On-Policy Learning with Action-Value Critic.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Variance Reduction for Evolution Strategies via Structured Control Variates.

[BibT_eX]

[DOI]

Alp Kucukelbir

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Practical Nonisotropic Monte Carlo Sampling in High Dimensions via Determinantal Point Processes.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Discretizing Continuous Action Space for On-Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Reinforcement Learning with Chromatic Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Wasserstein Reinforcement Learning.

[BibT_eX]

[DOI]

Michael I. Jordan

CoRR, 2019

Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes.

[BibT_eX]

[DOI]

CoRR, 2019

Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy.

[BibT_eX]

[DOI]

Mingzhang Yin

Mingyuan Zhou

CoRR, 2019

Adaptive Sample-Efficient Blackbox Optimization via ES-active Subspaces.

[BibT_eX]

[DOI]

CoRR, 2019

From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Provably Robust Blackbox Optimization for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 3rd Annual Conference on Robot Learning, 2019

Orthogonal Estimation of Wasserstein Distances.

[BibT_eX]

[DOI]

Mark Rowland

Jiri Hron

Tamás Sarlós

Adrian Weller

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

KAMA-NNs: Low-dimensional Rotation Based Neural Networks.

[BibT_eX]

[DOI]

Jeffrey Pennington

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Boosting Trust Region Policy Optimization by Normalizing Flows Policy.

[BibT_eX]

[DOI]

CoRR, 2018

Implicit Policy for Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Exploration by Distributional Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

2017

Variational Deep Q Network.

[BibT_eX]

[DOI]