Aldo Pacchiano

CoRR, February, 2026

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

Improved Training Mechanism for Reinforcement Learning via Online Model Selection.

[BibT_eX]

[DOI]

Aida Afshar

CoRR, December, 2025

The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification.

[BibT_eX]

[DOI]

Tavor Z. Baharav

Spyros Dragazis

CoRR, October, 2025

Enhancing Diversity in Large Language Models via Determinantal Point Processes.

[BibT_eX]

[DOI]

Yilei Chen

Souradip Chakraborty

Lorenz Wolf

Ioannis Ch. Paschalidis

CoRR, September, 2025

Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms.

[BibT_eX]

[DOI]

Xinyi Hu

CoRR, June, 2025

Learning to Explore: An In-Context Learning Approach for Pure Exploration.

[BibT_eX]

[DOI]

Alessio Russo

Ryan Welch

CoRR, June, 2025

Language Model Personalization via Reward Factorization.

[BibT_eX]

[DOI]

CoRR, March, 2025

Contextual Bandits with Stage-wise Constraints.

[BibT_eX]

[DOI]

Mohammad Ghavamzadeh

Peter L. Bartlett

J. Mach. Learn. Res., 2025

Active Preference Optimization for Sample Efficient RLHF.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2025

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Adaptive Exploration for Multi-Reward Multi-Policy Evaluation.

[BibT_eX]

[DOI]

Alessio Russo

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Feasible Action Search for Bandit Linear Programs via Thompson Sampling.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Multiple-policy Evaluation via Density Estimation.

[BibT_eX]

[DOI]

Yilei Chen

Ioannis Paschalidis

Proceedings of the Forty-second International Conference on Machine Learning, 2025

ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Second Order Bounds for Contextual Bandits with Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A Theoretical Framework for Partially-Observed Reward States in RLHF.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

On the Hardness of Bandit Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty Eighth Annual Conference on Learning Theory, 2025

Pure Exploration with Feedback Graphs.

[BibT_eX]

[DOI]

Alessio Russo

Yichen Song

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

2024

Estimating Optimal Policy Value in Linear Contextual Bandits Beyond Gaussianity.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives.

[BibT_eX]

[DOI]

Aida Afshar

CoRR, 2024

Provably Sample Efficient RLHF via Active Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

A Framework for Partially Observed Reward-States in RLHF.

[BibT_eX]

[DOI]

CoRR, 2024

State-free Reinforcement Learning.

[BibT_eX]

[DOI]

Mingyu Chen

Xuezhou Zhang

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Provable Interactive Learning with Hindsight Instruction Feedback.

[BibT_eX]

[DOI]

Dipendra Misra

Robert E. Schapire

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Improving Offline RL by Blending Heuristics.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Data-Driven Online Model Selection With Regret Guarantees.

[BibT_eX]

[DOI]

Christoph Dann

Claudio Gentile

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem.

[BibT_eX]

[DOI]

CoRR, 2023

Data-Driven Regret Balancing for Online Model Selection in Bandits.

[BibT_eX]

[DOI]

Christoph Dann

Claudio Gentile

CoRR, 2023

Estimating Optimal Policy Value in General Linear Contextual Bandits.

[BibT_eX]

[DOI]

CoRR, 2023

Experiment Planning with Function Approximation.

[BibT_eX]

[DOI]

Jonathan Lee

Emma Brunskill

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Anytime Model Selection in Linear Bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Unified Model and Dimension for Interactive Estimation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Supervised Pretraining Can Learn In-Context Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Leveraging Offline Data in Online Reinforcement Learning.

[BibT_eX]

[DOI]

Andrew Wagenmaker

Proceedings of the International Conference on Machine Learning, 2023

Neural Design for Genetic Perturbation Experiments.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit.

[BibT_eX]

[DOI]

Peter L. Bartlett

Proceedings of the International Conference on Algorithmic Learning Theory, 2023

Dueling RL: Reinforcement Learning with Trajectory Preferences.

[BibT_eX]

[DOI]

Aadirupa Saha

Jonathan Lee

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022

Transfer RL via the Undo Maps Formalism.

[BibT_eX]

[DOI]

CoRR, 2022

Joint Representation Training in Sequential Tasks with Shared Structure.

[BibT_eX]

[DOI]

CoRR, 2022

Learning General World Models in a Handful of Reward-Free Deployments.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Best of Both Worlds Model Selection.

[BibT_eX]

[DOI]

Christoph Dann

Claudio Gentile

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Meta Learning MDPs with linear transition models.

[BibT_eX]

[DOI]

Robert Müller

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Towards an Understanding of Default Policies in Multitask Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Model Selection for Contextual Bandits and Reinforcement Learning

[BibT_eX]

[DOI]

PhD thesis, 2021

Parallelizing Contextual Linear Bandits.

[BibT_eX]

[DOI]

CoRR, 2021

Unlocking Pixels for Reinforcement Learning via Implicit Attention.

[BibT_eX]

[DOI]

Deepali Jain

Valerii Likhosherstov

CoRR, 2021

Deep Reinforcement Learning with Dynamic Optimism.

[BibT_eX]

[DOI]

CoRR, 2021

ES-ENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Fairness with Continuous Optimal Transport.

[BibT_eX]

[DOI]

Silvia Chiappa

CoRR, 2021

Towards tractable optimism in model-based reinforcement learning.

[BibT_eX]

[DOI]

Philip J. Ball

Stephen Roberts

Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Neural Pseudo-Label Optimism for the Bank Loan Problem.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Near Optimal Policy Optimization via REPS.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Tactical Optimism and Pessimism for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Theory of Reinforcement Learning with Once-per-Episode Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Dynamic Balancing for Model Selection in Bandits and RL.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Stochastic Bandits with Linear Constraints.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Online Model Selection for Reinforcement Learning with Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Learning the Truth From Only One Side of the Story.

[BibT_eX]

[DOI]

Heinrich Jiang

Qijia Jiang

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Robustness Guarantees for Mode Estimation with an Application to Bandits.

[BibT_eX]

[DOI]

Heinrich Jiang

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Regret Bound Balancing and Elimination for Model Selection in Bandits and RL.

[BibT_eX]

[DOI]

CoRR, 2020

On Optimism in Model-Based Reinforcement Learning.

[BibT_eX]

[DOI]

Philip J. Ball

Stephen Roberts

CoRR, 2020

Regret Balancing for Bandit and RL Model Selection.

[BibT_eX]

[DOI]

Yasin Abbasi-Yadkori

My Phan

CoRR, 2020

On Thompson Sampling with Langevin Algorithms.

[BibT_eX]

[DOI]

CoRR, 2020

Effective Diversity in Population Based Reinforcement Learning.

[BibT_eX]

[DOI]

Krzysztof Marcin Choromanski

Stephen J. Roberts

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian.

[BibT_eX]

[DOI]

Alexander Peysakhovich

Jakob N. Foerster

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Model Selection in Contextual Stochastic Bandit Problems.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Learning to Score Behaviors for Guided Policy Optimization.

[BibT_eX]

[DOI]

Anna Choromanska

Proceedings of the 37th International Conference on Machine Learning, 2020

On Approximate Thompson Sampling with Langevin Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Accelerated Message Passing for Entropy-Regularized MAP Inference.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Stochastic Flows and Geometric Optimization on the Orthogonal Group.

[BibT_eX]

[DOI]

David Cheikhi

Jared Davis

Valerii Likhosherstov

Proceedings of the 37th International Conference on Machine Learning, 2020

Ready Policy One: World Building Through Active Learning.

[BibT_eX]

[DOI]

Philip J. Ball

Stephen J. Roberts

Proceedings of the 37th International Conference on Machine Learning, 2020

ES-MAML: Simple Hessian-Free Meta Learning.

[BibT_eX]

[DOI]

Wenbo Gao

Yuxiang Yang

Proceedings of the 8th International Conference on Learning Representations, 2020

Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference.

[BibT_eX]

[DOI]

Jonathan N. Lee

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Practical Nonisotropic Monte Carlo Sampling in High Dimensions via Determinantal Point Processes.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

A General Approach to Fairness with Optimal Transport.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Reinforcement Learning with Chromatic Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Approximate Sherali-Adams Relaxations for MAP Inference via Entropy Regularization.

[BibT_eX]

[DOI]

Jonathan N. Lee

CoRR, 2019

Wasserstein Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes.

[BibT_eX]

[DOI]

CoRR, 2019

Adaptive Sample-Efficient Blackbox Optimization via ES-active Subspaces.

[BibT_eX]

[DOI]

CoRR, 2019

When random search is not enough: Sample-Efficient and Noise-Robust Blackbox Optimization of RL Policies.

[BibT_eX]

[DOI]

CoRR, 2019

Wasserstein Fair Classification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Online learning with kernel losses.

[BibT_eX]

[DOI]

Niladri S. Chatterji

Peter L. Bartlett

Proceedings of the 36th International Conference on Machine Learning, 2019

Provably Robust Blackbox Optimization for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 3rd Annual Conference on Robot Learning, 2019

Computing Stable Solutions in Threshold Network Flow Games With Bounded Treewidth.

[BibT_eX]

[DOI]

Yoram Bachrach

Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

KAMA-NNs: Low-dimensional Rotation Based Neural Networks.

[BibT_eX]

[DOI]

Jeffrey Pennington

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Gen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation.

[BibT_eX]

[DOI]

CoRR, 2018

A note on reinforcement learning with Wasserstein distance regularisation, with applications to multipolicy learning.

[BibT_eX]

[DOI]

Mohammed Amin Abdullah

Moez Draief

CoRR, 2018

Geometrically Coupled Monte Carlo Sampling.

[BibT_eX]

[DOI]

Mark Rowland

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Gen-Oja: Simple & Efficient Algorithm for Streaming Generalized Eigenvector Computation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

Conditions beyond treewidth for tightness of higher-order LP relaxations.

[BibT_eX]

[DOI]

Mark Rowland

Adrian Weller

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2015

Real time clustering of time series using triangular potentials.

[BibT_eX]

[DOI]