Chen-Yu Wei

Braham Snyder

CoRR, February, 2026

Achieving Optimal Static and Dynamic Regret Simultaneously in Bandits with Deterministic Losses.

[BibT_eX]

[DOI]

Jian Qian

CoRR, February, 2026

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning.

[BibT_eX]

[DOI]

CoRR, January, 2026

Proximal Regret and Proximal Correlated Equilibria: A New Tractable Solution Concept for Online Learning and Games.

[BibT_eX]

[DOI]

Yang Cai

Constantinos Daskalakis

Weiqiang Zheng

Proceedings of the 58th Annual ACM Symposium on Theory of Computing, 2026

2025

An Improved Model-Free Decision-Estimation Coefficient with Applications in Adversarial MDPs.

[BibT_eX]

[DOI]

CoRR, October, 2025

From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications.

[BibT_eX]

[DOI]

CoRR, June, 2025

An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Decision Making in Hybrid Environments: A Model Aggregation Approach.

[BibT_eX]

[DOI]

Proceedings of the Thirty Eighth Annual Conference on Learning Theory, 2025

2024

Tractable Local Equilibria in Non-Concave Games.

[BibT_eX]

[DOI]

Yang Cai

Constantinos Daskalakis

Weiqiang Zheng

CoRR, 2024

Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

How Does Variance Shape the Regret in Contextual Bandits?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

On Tractable Φ-Equilibria in Non-Concave Games.

[BibT_eX]

[DOI]

Yang Cai

Constantinos Daskalakis

Weiqiang Zheng

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Offline Reinforcement Learning: Role of State Aggregation and Trajectory Data.

[BibT_eX]

[DOI]

Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games.

[BibT_eX]

[DOI]

CoRR, 2023

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Best of Both Worlds Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Refined Regret for Adversarial MDPs with Linear Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

A Unified Algorithm for Stochastic Path Problems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Algorithmic Learning Theory, 2023

2022

Personalization Improves Privacy-Accuracy Tradeoffs in Federated Optimization.

[BibT_eX]

[DOI]

CoRR, 2022

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

A Model Selection Approach for Corruption Robust Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Algorithmic Learning Theory, 29 March, 2022

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure.

[BibT_eX]

[DOI]

Hsu Kao

Vijay G. Subramanian

Proceedings of the International Conference on Algorithmic Learning Theory, 29 March, 2022

2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses.

[BibT_eX]

[DOI]

Chung-Wei Lee

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Linear Last-iterate Convergence in Constrained Saddle-point Optimization.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications.

[BibT_eX]

[DOI]

Liyu Chen

Proceedings of the Conference on Learning Theory, 2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition.

[BibT_eX]

[DOI]

Liyu Chen

Proceedings of the Conference on Learning Theory, 2021

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds.

[BibT_eX]

[DOI]

Ehsan Emamjomeh-Zadeh

David Kempe

Proceedings of the Algorithmic Learning Theory, 2021

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation.

[BibT_eX]

[DOI]

Mehdi Jafarnia-Jahromi

Rahul Jain

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

Linear Last-iterate Convergence for Matrix Games and Stochastic Games.

[BibT_eX]

[DOI]

CoRR, 2020

Federated Residual Learning.

[BibT_eX]

[DOI]

Alekh Agarwal

John Langford

CoRR, 2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes.

[BibT_eX]

[DOI]

Mehdi Jafarnia-Jahromi

Hiteshi Sharma

Rahul Jain

Proceedings of the 37th International Conference on Machine Learning, 2020

Taking a hint: How to leverage loss predictors in contextual bandits?

[BibT_eX]

[DOI]

Alekh Agarwal

Proceedings of the Conference on Learning Theory, 2020

2019

Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator.

[BibT_eX]

[DOI]

James A. Preiss

Sébastien M. R. Arnold

Marius Kloft

CoRR, 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously.

[BibT_eX]

[DOI]

Devanathan Thiruvenkatachari

Proceedings of the 36th International Conference on Machine Learning, 2019

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case.

[BibT_eX]

[DOI]

Alina Beygelzimer

Dávid Pál

Balázs Szörényi

Chicheng Zhang

Proceedings of the 36th International Conference on Machine Learning, 2019

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal and Parameter-free.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2019

Improved Path-length Regret Bounds for Bandits.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2019

Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2019

2018

Multi-Cell Cooperative Scheduling for Network Utility Maximization With User Equipment Side Interference Cancellation.

[BibT_eX]

[DOI]

Wanjiun Liao

IEEE Trans. Wirel. Commun., 2018

Efficient Online Portfolio with Logarithmic Regret.

[BibT_eX]

[DOI]

Kai Zheng

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

More Adaptive Algorithms for Adversarial Bandits.

[BibT_eX]

[DOI]

Proceedings of the Conference On Learning Theory, 2018

Efficient Contextual Bandits in Non-stationary Worlds.

[BibT_eX]

[DOI]

Proceedings of the Conference On Learning Theory, 2018

2017

Online Reinforcement Learning in Stochastic Games.

[BibT_eX]

[DOI]

Yi-Te Hong

Chi-Jen Lu

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

2016

Tracking the Best Expert in Non-stationary Stochastic Environments.

[BibT_eX]

[DOI]