Alekh Agarwal

Teodor Vanislavov Marinov

Manfred K. Warmuth

Proceedings of the International Conference on Algorithmic Learning Theory, 2024

2023

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking.

[BibT_eX]

[DOI]

CoRR, 2023

Peer Reviews of Peer Reviews: A Randomized Controlled Trial and Other Experiments.

[BibT_eX]

[DOI]

CoRR, 2023

An Empirical Evaluation of Federated Contextual Bandit Algorithms.

[BibT_eX]

[DOI]

H. Brendan McMahan

Zheng Xu

CoRR, 2023

Leveraging User-Triggered Supervision in Contextual Bandits.

[BibT_eX]

[DOI]

Claudio Gentile

Teodor V. Marinov

CoRR, 2023

Ordering-based Conditions for Global Convergence of Policy Gradient Methods.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Stochastic Gradient Succeeds for Bandits.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Learning in POMDPs is Sample-Efficient with Hindsight Observability.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation.

[BibT_eX]

[DOI]

Yujia Jin

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

Provable Benefits of Representational Transfer in Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Adversarially Trained Actor Critic for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Minimax Regret Optimization for Robust Machine Learning under Distribution Shift.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

2021

A Contextual Bandit Bake-off.

[BibT_eX]

[DOI]

Alberto Bietti

J. Mach. Learn. Res., 2021

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2021

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics.

[BibT_eX]

[DOI]

CoRR, 2021

Bellman-consistent Pessimism for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Provably Correct Optimization and Exploration with Non-linear Policies.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation.

[BibT_eX]

[DOI]

Andrea Zanette

Ching-An Cheng

Proceedings of the Conference on Learning Theory, 2021

Towards a Dimension-Free Understanding of Adaptive Linear Control.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

2020

Provably Good Batch Reinforcement Learning Without Great Exploration.

[BibT_eX]

[DOI]

CoRR, 2020

Policy Improvement from Multiple Experts.

[BibT_eX]

[DOI]

Ching-An Cheng

Andrey Kolobov

CoRR, 2020

Optimizing Interactive Systems via Data-Driven Objectives.

[BibT_eX]

[DOI]

CoRR, 2020

Reparameterized Variational Divergence Minimization for Stable Imitation.

[BibT_eX]

[DOI]

CoRR, 2020

Federated Residual Learning.

[BibT_eX]

[DOI]

Chen-Yu Wei

CoRR, 2020

Safe Reinforcement Learning via Curriculum Induction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Policy Improvement via Imitation of Multiple Oracles.

[BibT_eX]

[DOI]

Ching-An Cheng

Andrey Kolobov

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Taking a hint: How to leverage loss predictors in contextual bandits?

[BibT_eX]

[DOI]

Chen-Yu Wei

Haipeng Luo

Proceedings of the Conference on Learning Theory, 2020

Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal.

[BibT_eX]

[DOI]

Sham M. Kakade

Lin F. Yang

Proceedings of the Conference on Learning Theory, 2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2020

Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Active Learning for Cost-Sensitive Classification.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2019

On the Optimality of Sparse Model-Based Planning for Markov Decision Processes.

[BibT_eX]

[DOI]

Sham M. Kakade

Lin F. Yang

CoRR, 2019

Off-Policy Policy Gradient with State Distribution Correction.

[BibT_eX]

[DOI]

CoRR, 2019

Off-Policy Policy Gradient with Stationary Distribution Correction.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Provably efficient RL with Rich Observations via Latent State Decoding.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Fair Regression: Quantitative Definitions and Reduction-Based Algorithms.

[BibT_eX]

[DOI]

Zhiwei Steven Wu

Proceedings of the 36th International Conference on Machine Learning, 2019

Bias Correction of Learned Generative Models via Likelihood-free Importance Weighting.

[BibT_eX]

[DOI]

Proceedings of the Deep Generative Models for Highly Structured Data, 2019

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2019

2018

Model-Based Reinforcement Learning in Contextual Decision Processes.

[BibT_eX]

[DOI]

CoRR, 2018

On Polynomial Time PAC Reinforcement Learning with Rich Observations.

[BibT_eX]

[DOI]

CoRR, 2018

Practical Evaluation and Optimization of Contextual Bandit Algorithms.

[BibT_eX]

[DOI]

Alberto Bietti

CoRR, 2018

On Oracle-Efficient PAC RL with Rich Observations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Practical Contextual Bandits with Regression Oracles.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

A Reductions Approach to Fair Classification.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Hierarchical Imitation and Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Efficient Contextual Bandits in Non-stationary Worlds.

[BibT_eX]

[DOI]

Proceedings of the Conference On Learning Theory, 2018

Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon.

[BibT_eX]

[DOI]

Nan Jiang

Proceedings of the Conference On Learning Theory, 2018

2017

A Clustering Approach to Learning Sparsely Used Overcomplete Dictionaries.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Theory, 2017

Efficient Contextual Bandits in Non-stationary Worlds.

[BibT_eX]

[DOI]

Haipeng Luo

CoRR, 2017

Off-policy evaluation for slate recommendation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits.

[BibT_eX]

[DOI]

Yu-Xiang Wang

Proceedings of the 34th International Conference on Machine Learning, 2017

Contextual Decision Processes with low Bellman rank are PAC-Learnable.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Corralling a Band of Bandit Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 30th Conference on Learning Theory, 2017

Open Problem: First-Order Regret Bounds for Contextual Bandits.

[BibT_eX]

[DOI]

Proceedings of the 30th Conference on Learning Theory, 2017

2016

Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization.

[BibT_eX]

[DOI]

Prateek Jain

SIAM J. Optim., 2016

Efficient Second Order Online Learning via Sketching.

[BibT_eX]

[DOI]

CoRR, 2016

Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations.

[BibT_eX]

[DOI]

CoRR, 2016

A Multiworld Testing Decision Service.

[BibT_eX]

[DOI]

CoRR, 2016

Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains.

[BibT_eX]

[DOI]

CoRR, 2016

Efficient Second Order Online Learning by Sketching.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

PAC Reinforcement Learning with Rich Observations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Contextual semibandits via supervised learning oracles.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2015

Efficient Contextual Semi-Bandit Learning.

[BibT_eX]

[DOI]

CoRR, 2015

Fast Convergence of Regularized Learning in Games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Efficient and Parsimonious Agnostic Active Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Learning to Search Better than Your Teacher.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Machine Learning, 2015

A Lower Bound for the Optimization of Finite Sums.

[BibT_eX]

[DOI]

Léon Bottou

Proceedings of the 32nd International Conference on Machine Learning, 2015

2014

A reliable effective terascale linear learning system.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2014

Scalable Nonlinear Learning with Adaptive Polynomial Expansions.

[BibT_eX]

[DOI]

CoRR, 2014

Scalable Non-linear Learning with Adaptive Polynomial Expansions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Least Squares Revisited: Scalable Approaches for Multi-class Prediction.

[BibT_eX]

[DOI]

Proceedings of the 31th International Conference on Machine Learning, 2014

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits.

[BibT_eX]

[DOI]

Proceedings of the 31th International Conference on Machine Learning, 2014

Robust Multi-objective Learning with Mentor Feedback.

[BibT_eX]

[DOI]

Ashwinkumar Badanidiyuru

Robert E. Schapire

Aleksandrs Slivkins

Proceedings of The 27th Conference on Learning Theory, 2014

Learning Sparsely Used Overcomplete Dictionaries.

[BibT_eX]

[DOI]

Prateek Jain

Rashish Tandon

Proceedings of The 27th Conference on Learning Theory, 2014

Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions.

[BibT_eX]

[DOI]

Proceedings of the 48th Annual Conference on Information Sciences and Systems, 2014

2013

The Generalization Ability of Online Algorithms for Dependent Data.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Theory, 2013

Stochastic Convex Optimization with Bandit Feedback.

[BibT_eX]

[DOI]

SIAM J. Optim., 2013

Para-active learning.

[BibT_eX]

[DOI]

CoRR, 2013

Exact Recovery of Sparsely Used Overcomplete Dictionaries.

[BibT_eX]

[DOI]

CoRR, 2013

Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization.

[BibT_eX]

[DOI]

Prateek Jain

Rashish Tandon

CoRR, 2013

Selective sampling algorithms for cost-sensitive multiclass prediction.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on Machine Learning, 2013

2012

Computational Trade-offs in Statistical Learning.

[BibT_eX]

[DOI]

PhD thesis, 2012

Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Theory, 2012

Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling.

[BibT_eX]

[DOI]

IEEE Trans. Autom. Control., 2012

Ergodic Mirror Descent.

[BibT_eX]

[DOI]

SIAM J. Optim., 2012

Contextual Bandit Learning with Predictable Rewards.

[BibT_eX]

[DOI]

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Oracle inequalities for computationally adaptive model selection

[BibT_eX]

[DOI]

CoRR, 2012

FASt global convergence of gradient methods for solving regularized M-estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Statistical Signal Processing Workshop, 2012

Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Dual averaging for distributed optimization.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual Allerton Conference on Communication, 2012

2011

Oracle inequalities for computationally budgeted model selection.

[BibT_eX]

[DOI]

Proceedings of the COLT 2011, 2011

Fast global convergence of gradient methods for high-dimensional statistical recovery

[BibT_eX]

[DOI]

CoRR, 2011

Online and Batch Learning Algorithms for Data with Missing Features

[BibT_eX]

[DOI]

Afshin Rostamizadeh

CoRR, 2011

Learning with Missing Features.

[BibT_eX]

[DOI]

Afshin Rostamizadeh

Proceedings of the UAI 2011, 2011

Distributed Delayed Stochastic Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Machine Learning, 2011

2010

Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes.

[BibT_eX]

[DOI]

Pradeep Ravikumar

J. Mach. Learn. Res., 2010

Optimal Allocation Strategies for the Dark Pool Problem.

[BibT_eX]

[DOI]

Max Dama

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Distributed Dual Averaging In Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Fast global convergence rates of gradient methods for high-dimensional statistical recovery.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.

[BibT_eX]

[DOI]