Shangtong Zhang

CoRR, May, 2026

Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought.

[BibT_eX]

[DOI]

CoRR, May, 2026

Almost Sure Convergence Rates of Stochastic Approximation and Reinforcement Learning via a Poisson-Moreau Drift.

[BibT_eX]

[DOI]

CoRR, May, 2026

On the Divergence of Differential Temporal Difference Learning without Local Clocks.

[BibT_eX]

[DOI]

David Antrobius

CoRR, May, 2026

Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2026

Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes.

[BibT_eX]

[DOI]

Ethan Blaser

CoRR, February, 2026

MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics.

[BibT_eX]

[DOI]

CoRR, February, 2026

Multi-agent DRL-based Lane Change Decision Model for Cooperative Planning in Mixed Traffic.

[BibT_eX]

[DOI]

Zeyu Mu

B. Brian Park

CoRR, January, 2026

PRISM: A Locality-Aware Near-Memory Processing Framework for Scalable Triangle Counting.

[BibT_eX]

[DOI]

Xueyan Wang

Yier Jin

Proceedings of the Design, Automation & Test in Europe Conference, 2026

Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise.

[BibT_eX]

[DOI]

Ethan Blaser

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Prompt-Driven Domain Adaptation for End-to-End Autonomous Driving via In-Context RL.

[BibT_eX]

[DOI]

CoRR, November, 2025

Towards Formalizing Reinforcement Learning Theory.

[BibT_eX]

[DOI]

CoRR, November, 2025

Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Safe In-Context Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Towards Provable Emergence of In-Context Reinforcement Learning.

[BibT_eX]

[DOI]

Rohan Chandra

CoRR, September, 2025

Reward Is Enough: LLMs Are In-Context Reinforcement Learners.

[BibT_eX]

[DOI]

CoRR, June, 2025

Experience Replay Addresses Loss of Plasticity in Continual Learning.

[BibT_eX]

[DOI]

Rohan Chandra

CoRR, March, 2025

Group Fairness in Multi-Task Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

A Survey of In-Context Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, February, 2025

Linear <i>Q</i>-Learning Does Not Diverge: Convergence Rates to a Bounded Set.

[BibT_eX]

[DOI]

CoRR, January, 2025

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise.

[BibT_eX]

[DOI]

Shuhang Chen

J. Mach. Learn. Res., 2025

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

GameChat: Multi-LLM Dialogue for Safe, Agile, and Socially Optimal Multi-Agent Navigation in Constrained Environments.

[BibT_eX]

[DOI]

Vagul Mahadevan

Rohan Chandra

Proceedings of the IEEE International Symposium on Multi-Robot and Multi-Agent Systems, 2025

Counterfactual Explanations for Continuous Action Reinforcement Learning.

[BibT_eX]

[DOI]

Shuyang Dong

Lu Feng

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Linear Q-Learning Does Not Diverge in L2: Convergence Rates to a Bounded Set.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Revisiting a Design Choice in Gradient Temporal Difference Learning.

[BibT_eX]

[DOI]

Xiaochi Qian

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Doubly Optimal Policy Evaluation for Reinforcement Learning.

[BibT_eX]

[DOI]

Claire Chen

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning.

[BibT_eX]

[DOI]

Claire Chen

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficient Multi-Policy Evaluation for Reinforcement Learning.

[BibT_eX]

[DOI]

Claire Chen

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening.

[BibT_eX]

[DOI]

Amar Kulkarni

Madhur Behl

CoRR, 2024

Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise.

[BibT_eX]

[DOI]

CoRR, 2024

Almost Sure Convergence of Average Reward Temporal Difference Learning.

[BibT_eX]

[DOI]

Ethan Blaser

CoRR, 2024

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Multi-Policy Evaluation for Reinforcement Learning.

[BibT_eX]

[DOI]

Yuxin Chen

CoRR, 2024

Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2024

CRISP: Triangle Counting Acceleration via Content Addressable Memory-Integrated 3D-Stacked Memory.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Test Conference in Asia, 2024

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

IMGA: Efficient In-Memory Graph Convolution Network Aggregation With Data Flow Optimizations.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Direct Gradient Temporal Difference Learning.

[BibT_eX]

[DOI]

Xiaochi Qian

CoRR, 2023

Improving Monte Carlo Evaluation with Offline Data.

[BibT_eX]

[DOI]

CoRR, 2023

On the Convergence of SARSA with Linear Function Approximation.

[BibT_eX]

[DOI]

Romain Laroche

Proceedings of the International Conference on Machine Learning, 2023

A New Challenge in Policy Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Truncated Emphatic Temporal Difference Methods for Prediction and Control.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2022

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch.

[BibT_eX]

[DOI]

Romain Laroche

J. Mach. Learn. Res., 2022

On the Chattering of SARSA with Linear Function Approximation.

[BibT_eX]

[DOI]

Romain Laroche

CoRR, 2022

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 2022

Learning Expected Emphatic Traces for Deep RL.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Deep Residual Reinforcement Learning (Extended Abstract).

[BibT_eX]

[DOI]

Wendelin Boehmer

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Breaking the Deadly Triad with a Target Network.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Average-Reward Off-Policy Policy Evaluation with Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning.

[BibT_eX]

[DOI]

Bo Liu

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Per-Step Reward: A New Perspective for Risk-Averse Reinforcement Learning.

[BibT_eX]

[DOI]

Bo Liu

CoRR, 2020

Learning Retrospective Knowledge with Reverse Reinforcement Learning.

[BibT_eX]

[DOI]

Vivek Veeriah

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values.

[BibT_eX]

[DOI]

Bo Liu

Proceedings of the 37th International Conference on Machine Learning, 2020

Deep Residual Reinforcement Learning.

[BibT_eX]

[DOI]

Wendelin Boehmer

Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Provably Convergent Off-Policy Actor-Critic with Function Approximation.

[BibT_eX]

[DOI]

CoRR, 2019

Distributional Reinforcement Learning for Efficient Exploration.

[BibT_eX]

[DOI]

CoRR, 2019

Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards.

[BibT_eX]

[DOI]

CoRR, 2019

DAC: The Double Actor-Critic Architecture for Learning Options.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Generalized Off-Policy Actor-Critic.

[BibT_eX]

[DOI]

Wendelin Boehmer

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Exploration in the Face of Parametric and Intrinsic Uncertainties.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

QUOTA: The Quantile Option Architecture for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

mlpack 3: a fast, flexible machine learning library.

[BibT_eX]

[DOI]

J. Open Source Softw., 2018

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search.

[BibT_eX]

[DOI]

Hao Chen

CoRR, 2018

QUOTA: The Quantile Option Architecture for Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2018

2017

A Deeper Look at Experience Replay.

[BibT_eX]

[DOI]

Richard S. Sutton

CoRR, 2017

Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control.

[BibT_eX]

[DOI]

Osmar R. Zaïane

CoRR, 2017

Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks.

[BibT_eX]

[DOI]

Vivek Veeriah