Rémi Munos

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Stochastic Simultaneous Optimistic Optimization.

[BibT_eX]

[DOI]

Michal Valko

Proceedings of the 30th International Conference on Machine Learning, 2013

Toward Optimal Stratification for Stratified Monte-Carlo Integration.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on Machine Learning, 2013

Editors' Introduction.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory - 24th International Conference, 2013

Optimistic planning for belief-augmented Markov Decision Processes.

[BibT_eX]

[DOI]

Raphaël Fonteneau

Lucian Busoniu

Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2013

Optimistic planning for continuous-action deterministic systems.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2013

2012

Linear regression with random projections.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2012

Finite-sample analysis of least-squares policy iteration.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2012

Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit.

[BibT_eX]

[DOI]

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Optimistic planning for Markov decision processes.

[BibT_eX]

[DOI]

Lucian Busoniu

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Learning with stochastic inputs and adversarial outputs.

[BibT_eX]

[DOI]

J. Comput. Syst. Sci., 2012

Thompson Sampling: An Optimal Finite Time Analysis

[BibT_eX]

[DOI]

Emilie Kaufmann

Nathaniel Korda

CoRR, 2012

Risk-Aversion in Multi-armed Bandits.

[BibT_eX]

[DOI]

Amir Sani

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

On the Sample Complexity of Reinforcement Learning with a Generative Model .

[BibT_eX]

[DOI]

Mohammad Gheshlaghi Azar

Bert Kappen

Proceedings of the 29th International Conference on Machine Learning, 2012

Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis.

[BibT_eX]

[DOI]

Emilie Kaufmann

Nathaniel Korda

Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

Minimax Number of Strata for Online Stratified Sampling Given Noisy Samples.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

Least-Squares Methods for Policy Iteration.

[BibT_eX]

[DOI]

Proceedings of the Reinforcement Learning, 2012

2011

Pure exploration in finitely-armed and continuous-armed bandits.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2011

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences.

[BibT_eX]

[DOI]

Proceedings of the COLT 2011, 2011

Adaptive Bandits: Towards the best history-dependent strategy.

[BibT_eX]

[DOI]

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

<i>X</i>-Armed Bandits.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2011

Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Selecting the State-Representation in Reinforcement Learning.

[BibT_eX]

[DOI]

Daniil Ryabko

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Sparse Recovery with Brownian Sensing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Finite Time Analysis of Stratified Sampling for Monte Carlo.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Speedy Q-Learning.

[BibT_eX]

[DOI]

Mohammad Gheshlaghi Azar

Hilbert J. Kappen

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Finite-Sample Analysis of Lasso-TD.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Machine Learning, 2011

Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Reinforcement Learning - 9th European Workshop, 2011

Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory - 22nd International Conference, 2011

Optimistic planning for sparsely stochastic systems.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning, 2011

2010

Finite-sample Analysis of Bellman Residual Minimization.

[BibT_eX]

[DOI]

Proceedings of the 2nd Asian Conference on Machine Learning, 2010

X-Armed Bandits

[BibT_eX]

[DOI]

CoRR, 2010

Online Learning in Adversarial Lipschitz Environments.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2010

Scrambled Objects for Least-Squares Regression.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

LSTD with Random Projections.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Error Propagation for Approximate Policy and Value Iteration.

[BibT_eX]

[DOI]

Amir Massoud Farahmand

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Finite-Sample Analysis of LSTD.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Analysis of a Classification-based Policy Iteration Algorithm.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Open Loop Optimistic Planning.

[BibT_eX]

[DOI]

Proceedings of the COLT 2010, 2010

Best Arm Identification in Multi-Armed Bandits.

[BibT_eX]

[DOI]

Proceedings of the COLT 2010, 2010

2009

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2009

Compressed Least-Squares Regression.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Sensitivity analysis in HMMs with application to likelihood maximization.

[BibT_eX]

[DOI]

Pierre-Arnaud Coquelin

Romain Deguest

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Workshop summary: On-line learning with limited feedback.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Hybrid Stochastic-Adversarial On-line Learning.

[BibT_eX]

[DOI]

Proceedings of the COLT 2009, 2009

Pure Exploration in Multi-armed Bandits Problems.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory, 20th International Conference, 2009

2008

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path.

[BibT_eX]

[DOI]

András Antos

Mach. Learn., 2008

Finite-Time Bounds for Fitted Value Iteration.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2008

Pure Exploration for Multi-Armed Bandit Problems

[BibT_eX]

[DOI]

CoRR, 2008

Algorithms for Infinitely Many-Armed Bandits.

[BibT_eX]

[DOI]

Yizao Wang

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Particle Filter-based Policy Gradient in POMDPs.

[BibT_eX]

[DOI]

Pierre-Arnaud Coquelin

Romain Deguest

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Online Optimization in X-Armed Bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Optimistic Planning of Deterministic Systems.

[BibT_eX]

[DOI]

Jean-François Hren

Proceedings of the Recent Advances in Reinforcement Learning, 8th European Workshop, 2008

Adaptive play in Texas Hold'em Poker.

[BibT_eX]

[DOI]

Raphaël Maîtrepierre

Jérémie Mary

Proceedings of the ECAI 2008, 2008

2007

Performance Bounds in L<sub>p</sub>-norm for Approximate Value Iteration.

[BibT_eX]

[DOI]

SIAM J. Control. Optim., 2007

Analyse en norme Lp de l'algorithme d'itérations sur les valeurs avec approximations.

[BibT_eX]

[DOI]

Rev. d'Intelligence Artif., 2007

Bandit Algorithms for Tree Search.

[BibT_eX]

[DOI]

Pierre-Arnaud Coquelin

Proceedings of the UAI 2007, 2007

Fitted Q-iteration in continuous action-space MDPs.

[BibT_eX]

[DOI]

András Antos

Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Tuning Bandit Algorithms in Stochastic Environments.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory, 18th International Conference, 2007

2006

Policy Gradient in Continuous Time.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2006

Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2006

2005

Sensitivity Analysis Using It[o-circumflex]--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control.

[BibT_eX]

[DOI]

Emmanuel Gobet

SIAM J. Control. Optim., 2005

Finite time bounds for sampling based fitted value iteration.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2005

Error Bounds for Approximate Value Iteration.

[BibT_eX]

[DOI]

Proceedings of the Proceedings, 2005

2003

Error Bounds for Approximate Policy Iteration.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2003

2002

Variable Resolution Discretization in Optimal Control.

[BibT_eX]

[DOI]

Mach. Learn., 2002

2001

Efficient Resources Allocation for Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

2000

A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions.

[BibT_eX]

[DOI]

Mach. Learn., 2000

Rates of Convergence for Variable Resolution Schemes in Optimal Control.

[BibT_eX]

Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29, 2000

1999

Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation.

[BibT_eX]

[DOI]

Leemon C. Baird III

Proceedings of the International Joint Conference Neural Networks, 1999

Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems.

[BibT_eX]

[DOI]

Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999

1998

Barycentric Interpolators for Continuous Space and Time Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

A General Convergence Method for Reinforcement Learning in the Continuous Case.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning: ECML-98, 1998

1997

Reinforcement Learning for Continuous Stochastic Control Problems.

[BibT_eX]

[DOI]

Paul Bourgine

Proceedings of the Advances in Neural Information Processing Systems 10, 1997

A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method.

[BibT_eX]

[DOI]

Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, 1997

Finite-Element Methods with Local Triangulation Refinement for Continuous Reimforcement Learning Problems.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning: ECML-97, 1997

1996

A Convergent Reinforcement Learning Algorithm in the Continuous Case: The Finite-Element Reinforcement Learning.

[BibT_eX]