Andrea Zanette

CoRR, February, 2025

Accelerating Unbiased LLM Evaluation via Synthetic Feedback.

[BibT_eX]

[DOI]

Zhaoyi Zhou

Yuda Song

Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024

Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement.

[BibT_eX]

[DOI]

Ruiqi Zhang

Yuexiang Zhai

CoRR, 2024

Fast Best-of-N Decoding via Speculative Rejection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data.

[BibT_eX]

[DOI]

Ruiqi Zhang

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Bellman Residual Orthogonalization for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Design of Experiments for Stochastic Contextual Linear Bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation.

[BibT_eX]

[DOI]

Ching-An Cheng

Alekh Agarwal

Proceedings of the Conference on Learning Theory, 2021

2020

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration.

[BibT_eX]

[DOI]

Alessandro Lazaric

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Learning Near Optimal Policies with Low Inherent Bellman Error.

[BibT_eX]

[DOI]

Alessandro Lazaric

Proceedings of the 37th International Conference on Machine Learning, 2020

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration.

[BibT_eX]

[DOI]

CoRR, 2019

Limiting Extrapolation in Linear Approximate Value Iteration.

[BibT_eX]

[DOI]

Alessandro Lazaric

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

2018

Robust Super-Level Set Estimation Using Gaussian Processes.

[BibT_eX]

[DOI]

Junzi Zhang

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2018

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs.

[BibT_eX]

[DOI]