Tor Lattimore

CoRR, March, 2026

A Lyapunov Analysis of Softmax Policy Gradient for Stochastic Bandits.

[BibT_eX]

[DOI]

CoRR, March, 2026

A Diffusion Analysis of Policy Gradient for Stochastic Bandits.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence.

[BibT_eX]

[DOI]

CoRR, June, 2025

Thompson Sampling for Bandit Convex Optimisation.

[BibT_eX]

[DOI]

Alireza Bakhtiari

Proceedings of the Thirty Eighth Annual Conference on Learning Theory, 2025

2024

Sequential Best-Arm Identification with Application to P300 Speller.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Online Newton Method for Bandit Convex Optimisation.

[BibT_eX]

[DOI]

CoRR, 2024

Bandit Convex Optimisation.

[BibT_eX]

[DOI]

CoRR, 2024

Online Newton Method for Bandit Convex Optimisation Extended Abstract.

[BibT_eX]

[DOI]

Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

2023

Linear Partial Monitoring for Sequential Decision Making: Algorithms, Regret Bounds and Applications.

[BibT_eX]

[DOI]

Johannes Kirschner

Andreas Krause

J. Mach. Learn. Res., 2023

Sequential Best-Arm Identification with Application to Brain-Computer Interface.

[BibT_eX]

[DOI]

CoRR, 2023

Probabilistic Inference in Reinforcement Learning Done Right.

[BibT_eX]

[DOI]

Jean Tarbouriech

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Context-lumpable stochastic bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Leveraging Demonstrations to Improve Online Learning: Quality Matters.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

A Lower Bound for Linear and Kernel Regression with Adaptive Covariates.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

A Second-Order Method for Stochastic Bandit Convex Optimisation.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022

Regret Bounds for Information-Directed Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Contextual Information-Directed Sampling.

[BibT_eX]

[DOI]

Chao Qin

Proceedings of the International Conference on Machine Learning, 2022

Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits.

[BibT_eX]

[DOI]

Julian Zimmert

Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

2021

Minimax Regret for Bandit Convex Optimisation of Ridge Functions.

[BibT_eX]

[DOI]

CoRR, 2021

Geometric Entropic Exploration.

[BibT_eX]

[DOI]

Zhaohan Daniel Guo

Mohammad Gheshlaghi Azar

CoRR, 2021

Matrix games with bandit feedback.

[BibT_eX]

[DOI]

Ian Osband

Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

Variational Bayesian Optimistic Sampling.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Bandit Phase Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Information Directed Sampling for Sparse Linear Bandits.

[BibT_eX]

[DOI]

Wei Deng

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Optimality of Batch Policy Optimization Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Mirror Descent and the Information Ratio.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

Improved Regret for Zeroth-Order Stochastic Convex Bandits.

[BibT_eX]

[DOI]

Agnieszka Grabska-Barwinska

Proceedings of the Conference on Learning Theory, 2021

Asymptotically Optimal Information-Directed Sampling.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

Online Sparse Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Gated Linear Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Stochastic matrix games with bandit feedback.

[BibT_eX]

[DOI]

Ian Osband

CoRR, 2020

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation.

[BibT_eX]

[DOI]

CoRR, 2020

Model Selection in Contextual Stochastic Bandit Problems.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

High-Dimensional Sparse Linear Bandits.

[BibT_eX]

[DOI]

Mengdi Wang

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Gaussian Gated Linear Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Linear bandits with Stochastic Delayed Feedback.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Learning with Good Feature Representations in Bandits and in RL with a Generative Model.

[BibT_eX]

[DOI]

Gellért Weisz

Proceedings of the 37th International Conference on Machine Learning, 2020

Behaviour Suite for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Exploration by Optimisation in Partial Monitoring.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2020

Information Directed Sampling for Linear Partial Monitoring.

[BibT_eX]

[DOI]

Johannes Kirschner

Andreas Krause

Proceedings of the Conference on Learning Theory, 2020

Adaptive Exploration in Linear Contextual Bandit.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019

Learning with Good Feature Representations in Bandits and in RL with a Generative Model.

[BibT_eX]

[DOI]

Agnieszka Grabska-Barwinska

CoRR, 2019

Gated Linear Networks.

[BibT_eX]

[DOI]

Peter Toth

Simon Schmitt

CoRR, 2019

Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees.

[BibT_eX]

[DOI]

Laurent Orseau

Levi H. S. Lelis

CoRR, 2019

Adaptivity, Variance and Separation for Adversarial Bandits.

[BibT_eX]

[DOI]

Roman Pogodin

CoRR, 2019

On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits.

[BibT_eX]

[DOI]

Roman Pogodin

Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio.

[BibT_eX]

[DOI]

Julian Zimmert

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Geometric Perspective on Optimal Representations for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Iterative Budgeted Exponential Search.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Online Learning to Rank with Features.

[BibT_eX]

[DOI]

Shuai Li

Proceedings of the 36th International Conference on Machine Learning, 2019

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2019

Cleaning up the neighborhood: A full classification for adversarial partial monitoring.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory, 2019

Degenerate Feedback Loops in Recommender Systems.

[BibT_eX]

[DOI]

Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019

2018

Refining the Confidence Level for Optimistic Bandit Strategies.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2018

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits.

[BibT_eX]

[DOI]

CoRR, 2018

BubbleRank: Safe Online Learning to Rerank.

[BibT_eX]

[DOI]

CoRR, 2018

Single-Agent Policy Tree Search With Guarantees.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

TopRank: A practical algorithm for online stochastic ranking.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2017

Online Learning with Gated Linear Networks.

[BibT_eX]

[DOI]

Joel Veness

Agnieszka Grabska-Barwinska

Avishkar Bhoopchand

Christopher Mattern

Peter Toth

CoRR, 2017

UBEV - A More Practical Algorithm for Episodic RL with Near-Optimal PAC and Regret Guarantees.

[BibT_eX]

[DOI]

Christoph Dann

Emma Brunskill

CoRR, 2017

A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning.

[BibT_eX]

[DOI]

Christoph Dann

Emma Brunskill

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

On Thompson Sampling and Asymptotic Optimality.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Soft-Bayes: Prod for Mixtures of Experts with Log-Loss.

[BibT_eX]

[DOI]

Laurent Orseau

Shane Legg

Proceedings of the International Conference on Algorithmic Learning Theory, 2017

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016

Regret Analysis of the Anytime Optimally Confident UCB Algorithm.

[BibT_eX]

[DOI]

CoRR, 2016

Thompson Sampling is Asymptotically Optimal in General Environments.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016

Causal Bandits: Learning Good Interventions via Causal Inference.

[BibT_eX]

[DOI]

Finnian Lattimore

Mark D. Reid

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Refined Lower Bounds for Adversarial Bandits.

[BibT_eX]

[DOI]

Sébastien Gerchinovitz

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

On Explore-Then-Commit strategies.

[BibT_eX]

[DOI]

Aurélien Garivier

Emilie Kaufmann

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Conservative Bandits.

[BibT_eX]

[DOI]

Proceedings of the 33nd International Conference on Machine Learning, 2016

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits.

[BibT_eX]

[DOI]

Proceedings of the 29th Conference on Learning Theory, 2016

2015

On Martin-Löf (non-)convergence of Solomonoff's universal mixture.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2015

Optimally Confident UCB : Improved Regret for Finite-Armed Bandits.

[BibT_eX]

[DOI]

CoRR, 2015

Linear Multi-Resource Allocation with Semi-Bandit Feedback.

[BibT_eX]

[DOI]

Koby Crammer

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

The Pareto Regret Frontier for Bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

2014

Near-optimal PAC bounds for discounted MDPs.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2014

General time consistent discounting.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2014

Asymptotics of Continuous Bayes for Non-i.i.d. Sources.

[BibT_eX]

[DOI]

CoRR, 2014

Optimal Resource Allocation with Semi-Bandit Feedback.

[BibT_eX]

[DOI]

Koby Crammer

Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 2014

Bounded Regret for Finite-Armed Structured Bandits.

[BibT_eX]

[DOI]

Rémi Munos

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Free Lunch for optimisation under the universal distribution.

[BibT_eX]

[DOI]

Tom Everitt

Proceedings of the IEEE Congress on Evolutionary Computation, 2014

Bayesian Reinforcement Learning with Exploration.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory - 25th International Conference, 2014

On Learning the Optimal Waiting Time.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory - 25th International Conference, 2014

2013

On Martin-Löf Convergence of Solomonoff's Mixture.

[BibT_eX]

[DOI]

Proceedings of the Theory and Applications of Models of Computation, 2013

The Sample-Complexity of General Reinforcement Learning.

[BibT_eX]

[DOI]

Peter Sunehag

Proceedings of the 30th International Conference on Machine Learning, 2013

Universal Knowledge-Seeking Agents for Stochastic Environments.

[BibT_eX]

[DOI]

Laurent Orseau

Proceedings of the Algorithmic Learning Theory - 24th International Conference, 2013

Concentration and Confidence for Discrete Bayesian Sequence Predictors.

[BibT_eX]

[DOI]

Peter Sunehag

Proceedings of the Algorithmic Learning Theory - 24th International Conference, 2013

2012

PAC Bounds for Discounted MDPs.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

2011

No Free Lunch versus Occam's Razor in Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence, 2011

Universal Prediction of Selected Bits.

[BibT_eX]

[DOI]

Vaibhav Gavane

Proceedings of the Algorithmic Learning Theory - 22nd International Conference, 2011

Time Consistent Discounting.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory - 22nd International Conference, 2011

Asymptotically Optimal Agents.

[BibT_eX]

[DOI]