We stand with Ukraine

We stand with Ukraine

Mohammad Gheshlaghi Azar

According to our database¹, Mohammad Gheshlaghi Azar authored at least 46 papers between 2010 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Self-Improving Robust Preference Optimization.

[DOI]

,

,

,

Olivier Pietquin

,

Mohammad Gheshlaghi Azar

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

An Analysis of Quantile Temporal-Difference Learning.

[DOI]

,

,

Mohammad Gheshlaghi Azar

,

,

Georg Ostrovski

,

Anna Harutyunyan

,

,

Marc G. Bellemare

,

J. Mach. Learn. Res., 2024

Averaging log-likelihoods in direct alignment.

[DOI]

Nathan Grinsztajn

,

Yannis Flet-Berliac

,

Mohammad Gheshlaghi Azar

,

,

,

,

,

,

,

Olivier Pietquin

,

CoRR, 2024

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion.

[DOI]

Yannis Flet-Berliac

,

Nathan Grinsztajn

,

,

,

,

,

,

Mohammad Gheshlaghi Azar

,

Olivier Pietquin

,

CoRR, 2024

Offline Regularised Reinforcement Learning for Large Language Models Alignment.

[DOI]

Pierre Harvey Richemond

,

,

,

Daniele Calandriello

,

Mohammad Gheshlaghi Azar

,

Rafael Rafailov

,

Bernardo Ávila Pires

,

Eugene Tarassov

,

,

,

Aliaksei Severyn

,

Jonathan Mallinson

,

,

,

,

,

,

CoRR, 2024

Nash Learning from Human Feedback.

[DOI]

,

,

Daniele Calandriello

,

Mohammad Gheshlaghi Azar

,

,

,

,

,

,

,

,

,

,

,

,

Daniel J. Mankowitz

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion.

[DOI]

Yannis Flet-Berliac

,

Nathan Grinsztajn

,

,

,

,

,

,

,

Mohammad Gheshlaghi Azar

,

Olivier Pietquin

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

A General Theoretical Paradigm to Understand Learning from Human Preferences.

[DOI]

Mohammad Gheshlaghi Azar

,

Zhaohan Daniel Guo

,

,

,

,

,

Daniele Calandriello

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

Nash Learning from Human Feedback.

[DOI]

,

,

Daniele Calandriello

,

Mohammad Gheshlaghi Azar

,

,

Zhaohan Daniel Guo

,

,

,

,

,

,

,

,

,

Daniel J. Mankowitz

,

,

CoRR, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.

[DOI]

,

Zhaohan Daniel Guo

,

Pierre Harvey Richemond

,

Bernardo Ávila Pires

,

,

,

,

Mohammad Gheshlaghi Azar

,

Charline Le Lan

,

,

András György

,

Shantanu Thakoor

,

,

,

Daniele Calandriello

,

Proceedings of the International Conference on Machine Learning, 2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice.

[DOI]

Toshinori Kitamura

,

,

,

,

,

,

,

,

Mohammad Gheshlaghi Azar

,

,

Olivier Pietquin

,

,

Csaba Szepesvári

,

,

Proceedings of the International Conference on Machine Learning, 2023

2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal.

[DOI]

,

,

,

Toshinori Kitamura

,

,

,

,

Mohammad Gheshlaghi Azar

,

,

,

Olivier Pietquin

,

,

Csaba Szepesvári

CoRR, 2022

BYOL-Explore: Exploration by Bootstrapped Prediction.

[DOI]

,

Shantanu Thakoor

,

,

Bernardo Ávila Pires

,

Florent Altché

,

Corentin Tallec

,

,

Daniele Calandriello

,

Jean-Bastien Grill

,

,

,

,

Mohammad Gheshlaghi Azar

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Large-Scale Representation Learning on Graphs via Bootstrapping.

[DOI]

Shantanu Thakoor

,

Corentin Tallec

,

Mohammad Gheshlaghi Azar

,

,

,

,

Petar Velickovic

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction.

[DOI]

,

Mohammad Gheshlaghi Azar

,

,

,

Erik C. Johnson

,

Kiran Bhaskaran-Nair

,

,

Keith B. Hengen

,

William R. Gray Roncal

,

,

CoRR, 2021

Bootstrapped Representation Learning on Graphs.

[DOI]

Shantanu Thakoor

,

Corentin Tallec

,

Mohammad Gheshlaghi Azar

,

,

Petar Velickovic

,

CoRR, 2021

Geometric Entropic Exploration.

[DOI]

Zhaohan Daniel Guo

,

Mohammad Gheshlaghi Azar

,

,

Shantanu Thakoor

,

,

Bernardo Ávila Pires

,

,

,

,

CoRR, 2021

Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity.

[DOI]

,

,

,

,

Mohammad Gheshlaghi Azar

,

Keith B. Hengen

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020

The Advantage Regret-Matching Actor-Critic.

[DOI]

Audrunas Gruslys

,

,

,

Finbarr Timbers

,

,

Julien Pérolat

,

,

Vinícius Flores Zambaldi

,

Jean-Baptiste Lespiau

,

,

Mohammad Gheshlaghi Azar

,

Michael Bowling

,

CoRR, 2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning.

[DOI]

Jean-Bastien Grill

,

,

Florent Altché

,

Corentin Tallec

,

Pierre H. Richemond

,

Elena Buchatskaya

,

,

Bernardo Ávila Pires

,

,

Mohammad Gheshlaghi Azar

,

,

Koray Kavukcuoglu

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Fast computation of Nash Equilibria in Imperfect Information Games.

[DOI]

,

Julien Pérolat

,

Jean-Baptiste Lespiau

,

,

,

,

Finbarr Timbers

,

,

Shayegan Omidshafiei

,

Audrunas Gruslys

,

Mohammad Gheshlaghi Azar

,

Edward Lockhart

,

Proceedings of the 37th International Conference on Machine Learning, 2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning.

[DOI]

Zhaohan Daniel Guo

,

Bernardo Ávila Pires

,

,

Jean-Bastien Grill

,

Florent Altché

,

,

Mohammad Gheshlaghi Azar

Proceedings of the 37th International Conference on Machine Learning, 2020

2019

Meta-learning of Sequential Strategies.

[DOI]

CoRR, 2019

World Discovery Models.

[DOI]

Mohammad Gheshlaghi Azar

,

,

Bernardo A. Pires

,

Jean-Bastien Grill

,

Florent Altché

,

CoRR, 2019

Hindsight Credit Assignment.

[DOI]

Anna Harutyunyan

,

,

,

Mohammad Gheshlaghi Azar

,

,

,

Hado van Hasselt

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018

Neural Predictive Belief Representations.

[DOI]

Zhaohan Daniel Guo

,

Mohammad Gheshlaghi Azar

,

,

Bernardo A. Pires

,

,

CoRR, 2018

Observe and Look Further: Achieving Consistent Performance on Atari.

[DOI]

,

,

,

Mohammad Gheshlaghi Azar

,

,

,

Gabriel Barth-Maron

,

Hado van Hasselt

,

,

,

,

,

Olivier Pietquin

CoRR, 2018

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning.

[DOI]

Audrunas Gruslys

,

,

Mohammad Gheshlaghi Azar

,

,

Marc G. Bellemare

,

Proceedings of the 6th International Conference on Learning Representations, 2018

Noisy Networks For Exploration.

[DOI]

Meire Fortunato

,

Mohammad Gheshlaghi Azar

,

,

,

,

,

,

,

,

,

Olivier Pietquin

,

Charles Blundell

,

Proceedings of the 6th International Conference on Learning Representations, 2018

Rainbow: Combining Improvements in Deep Reinforcement Learning.

[DOI]

,

,

Hado van Hasselt

,

,

Georg Ostrovski

,

,

,

,

Mohammad Gheshlaghi Azar

,

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

The Reactor: A Sample-Efficient Actor-Critic Architecture.

[DOI]

Audrunas Gruslys

,

Mohammad Gheshlaghi Azar

,

Marc G. Bellemare

,

CoRR, 2017

Noisy Networks for Exploration.

[DOI]

Meire Fortunato

,

Mohammad Gheshlaghi Azar

,

,

,

,

,

,

,

,

Olivier Pietquin

,

Charles Blundell

,

CoRR, 2017

Minimax Regret Bounds for Reinforcement Learning.

[DOI]

Mohammad Gheshlaghi Azar

,

,

Proceedings of the 34th International Conference on Machine Learning, 2017

2016

Convex Relaxation Regression: Black-Box Optimization of Smooth Functions by Learning Their Convex Envelopes.

[DOI]

Mohammad Gheshlaghi Azar

,

,

Konrad P. Körding

Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016

Correcting Multivariate Auto-Regressive Models for the Influence of Unobserved Common Input.

[DOI]

,

Mohammad Gheshlaghi Azar

,

Hilbert J. Kappen

Proceedings of the Artificial Intelligence Research and Development, 2016

2014

Stochastic Optimization of a Locally Smooth Function under Correlated Bandit Feedback.

[DOI]

Mohammad Gheshlaghi Azar

,

Alessandro Lazaric

,

CoRR, 2014

Online Stochastic Optimization under Correlated Bandit Feedback.

[DOI]

Mohammad Gheshlaghi Azar

,

Alessandro Lazaric

,

Proceedings of the 31th International Conference on Machine Learning, 2014

2013

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model.

[DOI]

Mohammad Gheshlaghi Azar

,

,

Hilbert J. Kappen

Mach. Learn., 2013

Regret Bounds for Reinforcement Learning with Policy Advice.

[DOI]

Mohammad Gheshlaghi Azar

,

Alessandro Lazaric

,

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2013

Sequential Transfer in Multi-armed Bandit with Finite Set of Models.

[DOI]

Mohammad Gheshlaghi Azar

,

Alessandro Lazaric

,

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

2012

On the theory of reinforcement learning: methods, convergence analysis and sample complexity.

[DOI]

Mohammad Gheshlaghi Azar

PhD thesis, 2012

Dynamic policy programming.

[DOI]

Mohammad Gheshlaghi Azar

,

,

Hilbert J. Kappen

J. Mach. Learn. Res., 2012

On the Sample Complexity of Reinforcement Learning with a Generative Model .

[DOI]

Mohammad Gheshlaghi Azar

,

,

Proceedings of the 29th International Conference on Machine Learning, 2012

2011

Dynamic Policy Programming with Function Approximation.

[DOI]

Mohammad Gheshlaghi Azar

,

,

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Speedy Q-Learning.

[DOI]

Mohammad Gheshlaghi Azar

,

,

Mohammad Ghavamzadeh

,

Hilbert J. Kappen

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

2010

Dynamic Policy Programming

[DOI]

Mohammad Gheshlaghi Azar

,

Hilbert J. Kappen

CoRR, 2010

Loading...