Rémi Munos

According to our database1, Rémi Munos authored at least 176 papers between 1996 and 2018.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2018
Optimistic planning with an adaptive number of action switches for near-optimal nonlinear control.
Eng. Appl. of AI, 2018

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments.
CoRR, 2018

Implicit Quantile Networks for Distributional Reinforcement Learning.
CoRR, 2018

Maximum a Posteriori Policy Optimisation.
CoRR, 2018

Autoregressive Quantile Networks for Generative Modeling.
CoRR, 2018

Observe and Look Further: Achieving Consistent Performance on Atari.
CoRR, 2018

Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery.
CoRR, 2018

A Study on Overfitting in Deep Reinforcement Learning.
CoRR, 2018

Learning to Search with MCTSnets.
CoRR, 2018

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures.
CoRR, 2018

Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values.
Automatica, 2018

Autoregressive Quantile Networks for Generative Modeling.
Proceedings of the 35th International Conference on Machine Learning, 2018

The Uncertainty Bellman Equation and Exploration.
Proceedings of the 35th International Conference on Machine Learning, 2018

Learning to Search with MCTSnets.
Proceedings of the 35th International Conference on Machine Learning, 2018

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures.
Proceedings of the 35th International Conference on Machine Learning, 2018

Implicit Quantile Networks for Distributional Reinforcement Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement.
Proceedings of the 35th International Conference on Machine Learning, 2018

An Analysis of Categorical Distributional Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Distributional Reinforcement Learning With Quantile Regression.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Distributional Reinforcement Learning with Quantile Regression.
CoRR, 2017

The Uncertainty Bellman Equation and Exploration.
CoRR, 2017

Count-Based Exploration with Neural Density Models.
CoRR, 2017

The Reactor: A Sample-Efficient Actor-Critic Architecture.
CoRR, 2017

Automated Curriculum Learning for Neural Networks.
CoRR, 2017

Noisy Networks for Exploration.
CoRR, 2017

Observational Learning by Reinforcement Learning.
CoRR, 2017

A Distributional Perspective on Reinforcement Learning.
CoRR, 2017

The Cramer Distance as a Solution to Biased Wasserstein Gradients.
CoRR, 2017

Minimax Regret Bounds for Reinforcement Learning.
CoRR, 2017

Successor Features for Transfer in Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Count-Based Exploration with Neural Density Models.
Proceedings of the 34th International Conference on Machine Learning, 2017

Automated Curriculum Learning for Neural Networks.
Proceedings of the 34th International Conference on Machine Learning, 2017

A Distributional Perspective on Reinforcement Learning.
Proceedings of the 34th International Conference on Machine Learning, 2017

Minimax Regret Bounds for Reinforcement Learning.
Proceedings of the 34th International Conference on Machine Learning, 2017

Learning to reinforcement learn.
Proceedings of the 39th Annual Meeting of the Cognitive Science Society, 2017

2016
Guest Editors' foreword.
Theor. Comput. Sci., 2016

Analysis of Classification-based Policy Iteration Algorithms.
Journal of Machine Learning Research, 2016

Learning to reinforcement learn.
CoRR, 2016

Sample Efficient Actor-Critic with Experience Replay.
CoRR, 2016

PGQ: Combining policy gradient and Q-learning.
CoRR, 2016

Safe and Efficient Off-Policy Reinforcement Learning.
CoRR, 2016

Q($λ$) with Off-Policy Corrections.
CoRR, 2016

Memory-Efficient Backpropagation Through Time.
CoRR, 2016

Unifying Count-Based Exploration and Intrinsic Motivation.
CoRR, 2016

Successor Features for Transfer in Reinforcement Learning.
CoRR, 2016

Safe and Efficient Off-Policy Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Memory-Efficient Backpropagation Through Time.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Unifying Count-Based Exploration and Intrinsic Motivation.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Discounted near-optimal control of general continuous-action nonlinear systems using optimistic planning.
Proceedings of the 2016 American Control Conference, 2016

Q(λ) with Off-Policy Corrections.
Proceedings of the Algorithmic Learning Theory - 27th International Conference, 2016

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Increasing the Action Gap: New Operators for Reinforcement Learning.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Adaptive strategy for stratified Monte Carlo sampling.
Journal of Machine Learning Research, 2015

Cheap Bandits.
CoRR, 2015

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis.
CoRR, 2015

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits.
CoRR, 2015

Increasing the Action Gap: New Operators for Reinforcement Learning.
CoRR, 2015

Black-box optimization of noisy functions with unknown smoothness.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Cheap Bandits.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Toward Minimax Off-policy Value Estimation.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Fast Gradient Descent for Drifting Least Squares Regression, with Application to Bandits.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Regret bounds for restless Markov bandits.
Theor. Comput. Sci., 2014

Minimax number of strata for online stratified sampling: The case of noisy samples.
Theor. Comput. Sci., 2014

From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning.
Foundations and Trends in Machine Learning, 2014

Best-Arm Identification in Linear Bandits.
CoRR, 2014

Active Regression by Stratification.
CoRR, 2014

On Minimax Optimal Offline Policy Evaluation.
CoRR, 2014

Bounded Regret for Finite-Armed Structured Bandits.
CoRR, 2014

Bandit Algorithms for Tree Search.
CoRR, 2014

Relative confidence sampling for efficient on-line ranker evaluation.
Proceedings of the Seventh ACM International Conference on Web Search and Data Mining, 2014

Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2014

Optimistic Planning in Markov Decision Processes Using a Generative Model.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Best-Arm Identification in Linear Bandits.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Active Regression by Stratification.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Bounded Regret for Finite-Armed Structured Bandits.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Efficient learning by implicit exploration in bandit problems with side observations.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem.
Proceedings of the 31th International Conference on Machine Learning, 2014

Spectral Bandits for Smooth Graph Functions.
Proceedings of the 31th International Conference on Machine Learning, 2014

Bandits attack function optimization.
Proceedings of the IEEE Congress on Evolutionary Computation, 2014

Optimistic planning with a limited number of action switches for near-optimal nonlinear control.
Proceedings of the 53rd IEEE Conference on Decision and Control, 2014

An analysis of optimistic, best-first search for minimax sequential decision making.
Proceedings of the 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2014

Spectral Thompson Sampling.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model.
Machine Learning, 2013

Selecting the State-Representation in Reinforcement Learning
CoRR, 2013

Risk-Aversion in Multi-armed Bandits
CoRR, 2013

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem.
CoRR, 2013

Finite-Time Analysis of Kernelised Contextual Bandits.
CoRR, 2013

Analysis of stochastic approximation for efficient least squares regression and LSTD.
CoRR, 2013

Online gradient descent for least squares regression: Non-asymptotic bounds and application to bandits.
CoRR, 2013

Finite-Time Analysis of Kernelised Contextual Bandits.
Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, 2013

Thompson Sampling for 1-Dimensional Exponential Family Bandits.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Aggregating Optimistic Planning Trees for Solving Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Stochastic Simultaneous Optimistic Optimization.
Proceedings of the 30th International Conference on Machine Learning, 2013

Toward Optimal Stratification for Stratified Monte-Carlo Integration.
Proceedings of the 30th International Conference on Machine Learning, 2013

Editors' Introduction.
Proceedings of the Algorithmic Learning Theory - 24th International Conference, 2013

Optimistic planning for belief-augmented Markov Decision Processes.
Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2013

Optimistic planning for continuous-action deterministic systems.
Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2013

2012
Linear regression with random projections.
Journal of Machine Learning Research, 2012

Finite-sample analysis of least-squares policy iteration.
Journal of Machine Learning Research, 2012

Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Optimistic planning for Markov decision processes.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Learning with stochastic inputs and adversarial outputs.
J. Comput. Syst. Sci., 2012

Regret Bounds for Restless Markov Bandits
CoRR, 2012

Thompson Sampling: An Optimal Finite Time Analysis
CoRR, 2012

Risk-Aversion in Multi-armed Bandits.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

On the Sample Complexity of Reinforcement Learning with a Generative Model .
Proceedings of the 29th International Conference on Machine Learning, 2012

Regret Bounds for Restless Markov Bandits.
Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis.
Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

Minimax Number of Strata for Online Stratified Sampling Given Noisy Samples.
Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

2011
Pure exploration in finitely-armed and continuous-armed bandits.
Theor. Comput. Sci., 2011

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences.
Proceedings of the COLT 2011, 2011

Adaptive Bandits: Towards the best history-dependent strategy.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

X-Armed Bandits.
Journal of Machine Learning Research, 2011

Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Selecting the State-Representation in Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Sparse Recovery with Brownian Sensing.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Finite Time Analysis of Stratified Sampling for Monte Carlo.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Speedy Q-Learning.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Finite-Sample Analysis of Lasso-TD.
Proceedings of the 28th International Conference on Machine Learning, 2011

Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization.
Proceedings of the Recent Advances in Reinforcement Learning - 9th European Workshop, 2011

Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits.
Proceedings of the Algorithmic Learning Theory - 22nd International Conference, 2011

Optimistic planning for sparsely stochastic systems.
Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning, 2011

2010
Finite-sample Analysis of Bellman Residual Minimization.
Proceedings of the 2nd Asian Conference on Machine Learning, 2010

X-Armed Bandits
CoRR, 2010

Online Learning in Adversarial Lipschitz Environments.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2010

Scrambled Objects for Least-Squares Regression.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

LSTD with Random Projections.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Error Propagation for Approximate Policy and Value Iteration.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Finite-Sample Analysis of LSTD.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Analysis of a Classification-based Policy Iteration Algorithm.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Open Loop Optimistic Planning.
Proceedings of the COLT 2010, 2010

Best Arm Identification in Multi-Armed Bandits.
Proceedings of the COLT 2010, 2010

2009
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits.
Theor. Comput. Sci., 2009

Compressed Least-Squares Regression.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Sensitivity analysis in HMMs with application to likelihood maximization.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Workshop summary: On-line learning with limited feedback.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Hybrid Stochastic-Adversarial On-line Learning.
Proceedings of the COLT 2009, 2009

Pure Exploration in Multi-armed Bandits Problems.
Proceedings of the Algorithmic Learning Theory, 20th International Conference, 2009

2008
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path.
Machine Learning, 2008

Finite-Time Bounds for Fitted Value Iteration.
Journal of Machine Learning Research, 2008

Pure Exploration for Multi-Armed Bandit Problems
CoRR, 2008

Algorithms for Infinitely Many-Armed Bandits.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Particle Filter-based Policy Gradient in POMDPs.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Online Optimization in X-Armed Bandits.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Optimistic Planning of Deterministic Systems.
Proceedings of the Recent Advances in Reinforcement Learning, 8th European Workshop, 2008

Adaptive play in Texas Hold'em Poker.
Proceedings of the ECAI 2008, 2008

2007
Performance Bounds in Lp-norm for Approximate Value Iteration.
SIAM J. Control and Optimization, 2007

Analyse en norme Lp de l'algorithme d'itérations sur les valeurs avec approximations.
Revue d'Intelligence Artificielle, 2007

Bandit Algorithms for Tree Search
CoRR, 2007

Bandit Algorithms for Tree Search.
Proceedings of the UAI 2007, 2007

Fitted Q-iteration in continuous action-space MDPs.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Tuning Bandit Algorithms in Stochastic Environments.
Proceedings of the Algorithmic Learning Theory, 18th International Conference, 2007

2006
Policy Gradient in Continuous Time.
Journal of Machine Learning Research, 2006

Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation.
Journal of Machine Learning Research, 2006

Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path.
Proceedings of the Learning Theory, 19th Annual Conference on Learning Theory, 2006

2005
Sensitivity Analysis Using It[o-circumflex]--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control.
SIAM J. Control and Optimization, 2005

Finite time bounds for sampling based fitted value iteration.
Proceedings of the Machine Learning, 2005

Policy gradient in continuous time.
Proceedings of the Actes de CAP 05, Conférence francophone sur l'apprentissage automatique, 2005

Geometric Variance Reduction in Markov Chains. Application to Value Function and Gradient Estimation.
Proceedings of the Proceedings, 2005

Error Bounds for Approximate Value Iteration.
Proceedings of the Proceedings, 2005

2003
Error Bounds for Approximate Policy Iteration.
Proceedings of the Machine Learning, 2003

2002
Variable Resolution Discretization in Optimal Control.
Machine Learning, 2002

2001
Efficient Resources Allocation for Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

2000
A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions.
Machine Learning, 2000

Rates of Convergence for Variable Resolution Schemes in Optimal Control.
Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29, 2000

1999
Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation.
Proceedings of the International Joint Conference Neural Networks, 1999

Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems.
Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999

1998
Barycentric Interpolators for Continuous Space and Time Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

A General Convergence Method for Reinforcement Learning in the Continuous Case.
Proceedings of the Machine Learning: ECML-98, 1998

1997
Reinforcement Learning for Continuous Stochastic Control Problems.
Proceedings of the Advances in Neural Information Processing Systems 10, 1997

A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method.
Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, 1997

Finite-Element Methods with Local Triangulation Refinement for Continuous Reimforcement Learning Problems.
Proceedings of the Machine Learning: ECML-97, 1997

1996
A Convergent Reinforcement Learning Algorithm in the Continuous Case: The Finite-Element Reinforcement Learning.
Proceedings of the Machine Learning, 1996


  Loading...