Mohammad Ghavamzadeh

Orcid: 0000-0003-0930-8688

According to our database1, Mohammad Ghavamzadeh authored at least 162 papers between 2001 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Contextual Bandits with Stage-wise Constraints.
CoRR, 2024

2023
Maximum Entropy Model Correction in Reinforcement Learning.
CoRR, 2023

Preference Elicitation with Soft Attributes in Interactive Recommendation.
CoRR, 2023

Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage.
CoRR, 2023

Factual and Personalized Recommendations using Language Models and Reinforcement Learning.
CoRR, 2023

A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits.
CoRR, 2023

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models.
CoRR, 2023

On Dynamic Program Decompositions of Static Risk Measures.
CoRR, 2023

A Review of Deep Learning for Video Captioning.
CoRR, 2023

Aligning Text-to-Image Models using Human Feedback.
CoRR, 2023

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management.
CoRR, 2023

Ordering-based Conditions for Global Convergence of Policy Gradient Methods.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Multi-Task Off-Policy Learning from Bandit Feedback.
Proceedings of the International Conference on Machine Learning, 2023

A Mixture-of-Expert Approach to RL-based Dialogue Management.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Distributionally Robust Behavioral Cloning for Robust Imitation Learning.
Proceedings of the 62nd IEEE Conference on Decision and Control, 2023

Entropic Risk Optimization in Discounted MDPs.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

Multiple-policy High-confidence Policy Evaluation.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

Meta-Learning for Simple Regret Minimization.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk.
CoRR, 2022

Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings.
CoRR, 2022

A Mixture-of-Expert Approach to RL-based Dialogue Management.
CoRR, 2022

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms.
CoRR, 2022

Operator Splitting Value Iteration.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Robust Reinforcement Learning using Offline Data.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Risk-Averse Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Private and Communication-Efficient Algorithms for Entropy Estimation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Multi-Environment Meta-Learning in Stochastic Linear Bandits.
Proceedings of the IEEE International Symposium on Information Theory, 2022

Fixed-Budget Best-Arm Identification in Structured Bandits.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Feature and Parameter Selection in Stochastic Linear Bandits.
Proceedings of the International Conference on Machine Learning, 2022

Deep Hierarchy in Bandits.
Proceedings of the International Conference on Machine Learning, 2022

Mirror Descent Policy Optimization.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Collaborative Multi-agent Stochastic Linear Bandits.
Proceedings of the American Control Conference, 2022

Thompson Sampling with a Mixture Prior.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Hierarchical Bayesian Bandits.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
Active Learning for Classification With Abstention.
IEEE J. Sel. Areas Inf. Theory, 2021

A review of uncertainty quantification in deep learning: Techniques, applications and challenges.
Inf. Fusion, 2021

Parameter and Feature Selection in Stochastic Linear Bandits.
CoRR, 2021

Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm.
CoRR, 2021

Adaptive Sampling for Minimax Fair Classification.
CoRR, 2021

Adaptive Sampling for Minimax Fair Classification.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Neural Lyapunov Redesign.
Proceedings of the 3rd Annual Conference on Learning for Dynamics and Control, 2021

Variational Model-based Policy Optimization.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

PID Accelerated Value Iteration Algorithm.
Proceedings of the 38th International Conference on Machine Learning, 2021

Control-Aware Representations for Model-based Reinforcement Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Stochastic Bandits with Linear Constraints.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Deep Bayesian Quadrature Policy Optimization.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Non-Stationary Latent Bandits.
CoRR, 2020

Soft-Robust Algorithms for Handling Model Misspecification.
CoRR, 2020

Variance-Reduced Off-Policy Memory-Efficient Policy Search.
CoRR, 2020

Finite-Sample Analysis of GTD Algorithms.
CoRR, 2020

Automatic Policy Synthesis to Improve the Safety of Nonlinear Dynamical Systems.
CoRR, 2020

Policy-Aware Model Learning for Policy Gradient Methods.
CoRR, 2020

Active Model Estimation in Markov Decision Processes.
Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, 2020

Online Planning with Lookahead Policies.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Multi-step Greedy Reinforcement Learning Algorithms.
Proceedings of the 37th International Conference on Machine Learning, 2020

Predictive Coding for Locally-Linear Control.
Proceedings of the 37th International Conference on Machine Learning, 2020

Adaptive Sampling for Estimating Probability Distributions.
Proceedings of the 37th International Conference on Machine Learning, 2020

Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control.
Proceedings of the 8th International Conference on Learning Representations, 2020

Safe Policy Learning for Continuous Control.
Proceedings of the 4th Conference on Robot Learning, 2020

Randomized Exploration in Generalized Linear Bandits.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Conservative Exploration in Reinforcement Learning.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Improved Algorithms for Conservative Exploration in Bandits.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Adaptive Sampling for Estimating Multiple Probability Distributions.
CoRR, 2019

Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning.
CoRR, 2019

Benchmarking Batch Deep Reinforcement Learning Algorithms.
CoRR, 2019

Multi-Step Greedy and Approximate Real Time Dynamic Programming.
CoRR, 2019

Active Learning for Binary Classification with Abstention.
CoRR, 2019

Binary Classification with Bounded Abstention Rate.
CoRR, 2019

Lyapunov-based Safe Policy Optimization for Continuous Control.
CoRR, 2019

Perturbed-History Exploration in Stochastic Linear Bandits.
Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits.
Proceedings of the 36th International Conference on Machine Learning, 2019

Risk-Sensitive Generative Adversarial Imitation Learning.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Optimizing over a Restricted Policy Class in MDPs.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity.
J. Artif. Intell. Res., 2018

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits.
CoRR, 2018

Optimizing over a Restricted Policy Class in Markov Decision Processes.
CoRR, 2018

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

A Lyapunov-based Approach to Safe Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

PAC Bandits with Risk Constraints.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2018

More Robust Doubly Robust Off-policy Evaluation.
Proceedings of the 35th International Conference on Machine Learning, 2018

Path Consistency Learning in Tsallis Entropy Regularized MDPs.
Proceedings of the 35th International Conference on Machine Learning, 2018

Robust Locally-Linear Controllable Embedding.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
Sequential Decision Making With Coherent Risk.
IEEE Trans. Autom. Control., 2017

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria.
J. Mach. Learn. Res., 2017

Disentangling Dynamics and Content for Control and Planning.
CoRR, 2017

Diffusion Independent Semi-Bandit Influence Maximization.
CoRR, 2017

Conservative Contextual Linear Bandits.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Importance of Recommendation Policy Space in Addressing Click Sparsity in Personalized Advertisement Display.
Proceedings of the Machine Learning and Data Mining in Pattern Recognition, 2017

Online Learning to Rank in Stochastic Click Models.
Proceedings of the 34th International Conference on Machine Learning, 2017

Model-Independent Online Learning for Influence Maximization.
Proceedings of the 34th International Conference on Machine Learning, 2017

Bottleneck Conditional Density Estimation.
Proceedings of the 34th International Conference on Machine Learning, 2017

Active Learning for Accurate Estimation of Linear Models.
Proceedings of the 34th International Conference on Machine Learning, 2017

Sequential Multiple Hypothesis Testing with Type I Error Control.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

Automated Data Cleansing through Meta-Learning.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Variance-constrained actor-critic algorithms for discounted and average reward MDPs.
Mach. Learn., 2016

Analysis of Classification-based Policy Iteration Algorithms.
J. Mach. Learn. Res., 2016

Bayesian Policy Gradient and Actor-Critic Algorithms.
J. Mach. Learn. Res., 2016

Regularized Policy Iteration with Nonparametric Function Spaces.
J. Mach. Learn. Res., 2016

Conservative Contextual Linear Bandits.
CoRR, 2016

Personalized Advertisement Recommendation: A Ranking Approach to Address the Ubiquitous Click Sparsity Problem.
CoRR, 2016

Graphical Model Sketch.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2016

Safe Policy Improvement by Minimizing Robust Baseline Regret.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Proximal Gradient Temporal Difference Learning Algorithms.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Improved Learning Complexity in Combinatorial Pure Exploration Bandits.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015
Classification-Based Approximate Policy Iteration.
IEEE Trans. Autom. Control., 2015

Approximate modified policy iteration and its application to the game of Tetris.
J. Mach. Learn. Res., 2015

Bayesian Reinforcement Learning: A Survey.
Found. Trends Mach. Learn., 2015

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits.
CoRR, 2015

Ad Recommendation Systems for Life-Time Value Optimization.
Proceedings of the 24th International Conference on World Wide Web Companion, 2015

Finite-Sample Analysis of Proximal Gradient TD Algorithms.
Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015

Policy Gradient for Coherent Risk Measures.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Maximum Entropy Semi-Supervised Inverse Reinforcement Learning.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

High Confidence Policy Improvement.
Proceedings of the 32nd International Conference on Machine Learning, 2015

High-Confidence Off-Policy Evaluation.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Classification-based Approximate Policy Iteration: Experiments and Extended Discussions.
CoRR, 2014

Actor-Critic Algorithms for Risk-Sensitive Reinforcement Learning.
CoRR, 2014

Algorithms for CVaR Optimization in MDPs.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

2013
Actor-Critic Algorithms for Risk-Sensitive MDPs.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Approximate Dynamic Programming Finally Performs Well in the Game of Tetris.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Cost-sensitive Multiclass Classification Risk Bounds.
Proceedings of the 30th International Conference on Machine Learning, 2013

A Generalized Kernel Approach to Structured Output Learning.
Proceedings of the 30th International Conference on Machine Learning, 2013

2012
Finite-sample analysis of least-squares policy iteration.
J. Mach. Learn. Res., 2012

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Approximate Modified Policy Iteration.
Proceedings of the 29th International Conference on Machine Learning, 2012

A Dantzig Selector Approach to Temporal Difference Learning.
Proceedings of the 29th International Conference on Machine Learning, 2012

Semi-Supervised Apprenticeship Learning.
Proceedings of the Tenth European Workshop on Reinforcement Learning, 2012

Conservative and Greedy Approaches to Classification-Based Policy Iteration.
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

Bayesian Reinforcement Learning.
Proceedings of the Reinforcement Learning, 2012

Least-Squares Methods for Policy Iteration.
Proceedings of the Reinforcement Learning, 2012

2011
Multi-Bandit Best Arm Identification.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Speedy Q-Learning.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Finite-Sample Analysis of Lasso-TD.
Proceedings of the 28th International Conference on Machine Learning, 2011

Classification-based Policy Iteration with a Critic.
Proceedings of the 28th International Conference on Machine Learning, 2011

Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization.
Proceedings of the Recent Advances in Reinforcement Learning - 9th European Workshop, 2011

Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits.
Proceedings of the Algorithmic Learning Theory - 22nd International Conference, 2011

2010
Finite-sample Analysis of Bellman Residual Minimization.
Proceedings of the 2nd Asian Conference on Machine Learning, 2010

LSTD with Random Projections.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Finite-Sample Analysis of LSTD.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Analysis of a Classification-based Policy Iteration Algorithm.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Bayesian Multi-Task Reinforcement Learning.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

2009
Natural actor-critic algorithms.
Autom., 2009

Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems.
Proceedings of the American Control Conference, 2009

2008
Regularized Policy Iteration.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Regularized Fitted Q-Iteration: Application to Planning.
Proceedings of the Recent Advances in Reinforcement Learning, 8th European Workshop, 2008

2007
Hierarchical Average Reward Reinforcement Learning.
J. Mach. Learn. Res., 2007

Incremental Natural Actor-Critic Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Bayesian actor-critic algorithms.
Proceedings of the Machine Learning, 2007

2006
Hierarchical multi-agent reinforcement learning.
Auton. Agents Multi Agent Syst., 2006

Bayesian Policy Gradient Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

2005
The Workshop Program at the Nineteenth National Conference on Artificial Intelligence.
AI Mag., 2005

2004
Learning to Communicate and Act Using Hierarchical Reinforcement Learning.
Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2004), 2004

2003
Hierarchical Policy Gradient Algorithms.
Proceedings of the Machine Learning, 2003

2002
Hierarchically Optimal Average Reward Reinforcement Learning.
Proceedings of the Machine Learning, 2002

A multiagent reinforcement learning algorithm by dynamically merging markov decision processes.
Proceedings of the First International Joint Conference on Autonomous Agents & Multiagent Systems, 2002

2001
Continuous-Time Hierarchical Reinforcement Learning.
Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28, 2001


  Loading...