Mohammad Gheshlaghi Azar

According to our database1, Mohammad Gheshlaghi Azar authored at least 45 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Averaging log-likelihoods in direct alignment.
CoRR, 2024

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion.
CoRR, 2024

Self-Improving Robust Preference Optimization.
CoRR, 2024

Offline Regularised Reinforcement Learning for Large Language Models Alignment.
CoRR, 2024


A General Theoretical Paradigm to Understand Learning from Human Preferences.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023
Nash Learning from Human Feedback.
CoRR, 2023

A General Theoretical Paradigm to Understand Learning from Human Preferences.
CoRR, 2023

An Analysis of Quantile Temporal-Difference Learning.
CoRR, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice.
Proceedings of the International Conference on Machine Learning, 2023

2022
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal.
CoRR, 2022

BYOL-Explore: Exploration by Bootstrapped Prediction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Large-Scale Representation Learning on Graphs via Bootstrapping.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction.
CoRR, 2021

Bootstrapped Representation Learning on Graphs.
CoRR, 2021

Geometric Entropic Exploration.
CoRR, 2021

Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020
The Advantage Regret-Matching Actor-Critic.
CoRR, 2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Fast computation of Nash Equilibria in Imperfect Information Games.
Proceedings of the 37th International Conference on Machine Learning, 2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

2019
Meta-learning of Sequential Strategies.
CoRR, 2019

World Discovery Models.
CoRR, 2019

Hindsight Credit Assignment.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018
Neural Predictive Belief Representations.
CoRR, 2018

Observe and Look Further: Achieving Consistent Performance on Atari.
CoRR, 2018

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning.
Proceedings of the 6th International Conference on Learning Representations, 2018

Noisy Networks For Exploration.
Proceedings of the 6th International Conference on Learning Representations, 2018

Rainbow: Combining Improvements in Deep Reinforcement Learning.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
The Reactor: A Sample-Efficient Actor-Critic Architecture.
CoRR, 2017

Noisy Networks for Exploration.
CoRR, 2017

Minimax Regret Bounds for Reinforcement Learning.
Proceedings of the 34th International Conference on Machine Learning, 2017

2016
Convex Relaxation Regression: Black-Box Optimization of Smooth Functions by Learning Their Convex Envelopes.
Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016

Correcting Multivariate Auto-Regressive Models for the Influence of Unobserved Common Input.
Proceedings of the Artificial Intelligence Research and Development, 2016

2014
Stochastic Optimization of a Locally Smooth Function under Correlated Bandit Feedback.
CoRR, 2014

Online Stochastic Optimization under Correlated Bandit Feedback.
Proceedings of the 31th International Conference on Machine Learning, 2014

2013
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model.
Mach. Learn., 2013

Regret Bounds for Reinforcement Learning with Policy Advice.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2013

Sequential Transfer in Multi-armed Bandit with Finite Set of Models.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

2012
Dynamic policy programming.
J. Mach. Learn. Res., 2012

On the Sample Complexity of Reinforcement Learning with a Generative Model .
Proceedings of the 29th International Conference on Machine Learning, 2012

2011
Dynamic Policy Programming with Function Approximation.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Speedy Q-Learning.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

2010
Dynamic Policy Programming
CoRR, 2010


  Loading...