Kavosh Asadi

Karim Bouyarmane

CoRR, September, 2025

C-3DPO: Constrained Controlled Classification for Direct Preference Optimization.

[BibT_eX]

[DOI]

Dominique Perrault-Joncas

Julien Han

Xingzi Xu

Shoham Sabach

Karim Bouyarmane

Mohammad Ghavamzadeh

CoRR, February, 2025

Adjoint sharding for very long context training of state space models.

[BibT_eX]

[DOI]

CoRR, January, 2025

Activation sharding for scalable training of large models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

2024

On Welfare-Centric Fair Reinforcement Learning.

[BibT_eX]

[DOI]

RLJ, 2024

Learning the Target Network in Function Space.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

TD Convergence: An Optimization Perspective.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Resetting the Optimizer in Deep RL: An Empirical Study.

[BibT_eX]

[DOI]

Rasool Fakoor

Shoham Sabach

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Coarse-Grained Smoothness for Reinforcement Learning in Metric Spaces.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022

Characterizing the Action-Generalization Gap in Deep Q-Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Adaptive Interest for Emphatic Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Faster Deep Reinforcement Learning with Slower Online Network.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Smoothness in Reinforcement Learning with Large State and Action Spaces.

[BibT_eX]

[DOI]

PhD thesis, 2021

Deep Q-Network with Proximal Iteration.

[BibT_eX]

[DOI]

CoRR, 2021

Coarse-Grained Smoothness for RL in Metric Spaces.

[BibT_eX]

[DOI]

CoRR, 2021

Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback.

[BibT_eX]

[DOI]

CoRR, 2021

Continuous Doubly Constrained Batch Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Lipschitz Lifelong Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Deep Radial-Basis Value Functions for Continuous Control.

[BibT_eX]

[DOI]

Neev Parikh

Ronald E. Parr

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Learning State Abstractions for Transfer in Continuous Control.

[BibT_eX]

[DOI]

David Abel

Michael Littman

CoRR, 2020

Deep RBF Value Functions for Continuous Control.

[BibT_eX]

[DOI]

Ronald E. Parr

CoRR, 2020

2019

Combating the Compounding-Error Problem with a Multi-step Model.

[BibT_eX]

[DOI]

CoRR, 2019

DeepMellow: Removing the Need for a Target Network in Deep Q-Learning.

[BibT_eX]

[DOI]

Seungchan Kim

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Removing the Target Network from Deep Q-Networks with the Mellowmax Operator.

[BibT_eX]

[DOI]

Seungchan Kim

Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

State Abstraction as Compression in Apprenticeship Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Mitigating Planner Overfitting in Model-Based Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Towards a Simple Approach to Multi-step Model-based Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Lipschitz Continuity in Model-based Reinforcement Learning.

[BibT_eX]

[DOI]

Dipendra Misra

Proceedings of the 35th International Conference on Machine Learning, 2018

2017

Mean Actor Critic.

[BibT_eX]

[DOI]

CoRR, 2017

An Alternative Softmax Operator for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning.

[BibT_eX]

[DOI]

Jason D. Williams

Geoffrey Zweig

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016

Sample-efficient Deep Reinforcement Learning for Dialog Control.

[BibT_eX]

[DOI]

Jason D. Williams

CoRR, 2016

A New Softmax Operator for Reinforcement Learning.

[BibT_eX]

[DOI]