Nan Jiang

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Reinforcement Learning Under Latent Dynamics: Toward Statistical and Algorithmic Modularity.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Harnessing Density Ratios for Online Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mitigating the Alignment Tax of RLHF.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Word Embeddings Are Steers for Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF.

[BibT_eX]

[DOI]

CoRR, 2023

Mitigating the Alignment Tax of RLHF.

[BibT_eX]

[DOI]

CoRR, 2023

LM-Switch: Lightweight Language Model Conditioning in Word Embedding Space.

[BibT_eX]

[DOI]

CoRR, 2023

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Adversarial Model for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Reinforcement Learning in Low-rank MDPs with Density Features.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation.

[BibT_eX]

[DOI]

Philip Amortila

Csaba Szepesvári

Proceedings of the International Conference on Machine Learning, 2023

The Role of Coverage in Online Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Extended Abstract: Learning in Low-rank MDPs with Density Features.

[BibT_eX]

[DOI]

Proceedings of the 57th Annual Conference on Information Sciences and Systems, 2023

2022

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data.

[BibT_eX]

[DOI]

CoRR, 2022

Offline reinforcement learning under value and density-ratio realizability: The power of gaps.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2022

Interaction-Grounded Learning with Action-Inclusive Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Adversarially Trained Actor Critic for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Offline Reinforcement Learning with Realizability and Single-policy Concentrability.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Leveraging Synergies Between AI and Networking to Build Next Generation Edge Networks.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE International Conference on Collaboration and Internet Computing, 2022

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes.

[BibT_eX]

[DOI]

Chengchun Shi

Masatoshi Uehara

CoRR, 2021

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency.

[BibT_eX]

[DOI]

CoRR, 2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Bellman-consistent Pessimism for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Batch Value-function Approximation with Only Realizability.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

Minimax Model Learning.

[BibT_eX]

[DOI]

Cameron Voloshin

Yisong Yue

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration.

[BibT_eX]

[DOI]

Priyank Agrawal

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting.

[BibT_eX]

[DOI]

Philip Amortila

CoRR, 2020

Q<sup>*</sup> Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison.

[BibT_eX]

[DOI]

CoRR, 2020

Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization.

[BibT_eX]

[DOI]

CoRR, 2020

Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, 2020

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Minimax Weight and Q-Function Learning for Off-Policy Evaluation.

[BibT_eX]

[DOI]

Masatoshi Uehara

Proceedings of the 37th International Conference on Machine Learning, 2020

From Importance Sampling to Doubly Robust Policy Gradient.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019

Minimax Weight and Q-Function Learning for Off-Policy Evaluation.

[BibT_eX]

[DOI]

Masatoshi Uehara

CoRR, 2019

Provably Efficient Q-Learning with Low Switching Cost.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Provably efficient RL with Rich Observations via Latent State Decoding.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Information-Theoretic Considerations in Batch Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2019

2018

Model-Based Reinforcement Learning in Contextual Decision Processes.

[BibT_eX]

[DOI]

CoRR, 2018

On Polynomial Time PAC Reinforcement Learning with Rich Observations.

[BibT_eX]

[DOI]

CoRR, 2018

Completing State Representations using Spectral Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

On Oracle-Efficient PAC RL with Rich Observations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Hierarchical Imitation and Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon.

[BibT_eX]

[DOI]

Alekh Agarwal

Proceedings of the Conference On Learning Theory, 2018

Markov Decision Processes with Continuous Side Information.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory, 2018

2017

Repeated Inverse Reinforcement Learning.

[BibT_eX]

[DOI]

Kareem Amin

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Contextual Decision Processes with low Bellman rank are PAC-Learnable.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

2016

On Structural Properties of MDPs that Bound Loss Due to Shallow Planning.

[BibT_eX]

[DOI]

Ambuj Tewari

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

The Dependence of Effective Planning Horizon on Model Accuracy.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Improving Predictive State Representations via Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

Abstraction Selection in Model-based Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Machine Learning, 2015

Low-Rank Spectral Learning with Weighted Loss Functions.

[BibT_eX]

[DOI]

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Spectral Learning of Predictive State Representations with Insufficient Statistics.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014

Improving UCT planning via approximate homomorphisms.

[BibT_eX]

[DOI]