We stand with Ukraine

We stand with Ukraine

Bilal Piot

Orcid: 0000-0002-6456-7183

Affiliations:

Lille University of Science and Technology, Research center in Computer Science, Signal and Automatic Control (CRIStAL)

According to our database¹, Bilal Piot authored at least 66 papers between 2012 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2025

Learning from negative feedback, or positive feedback or both.

[DOI]

Abbas Abdolmaleki

,

,

Bobak Shahriari

,

Jost Tobias Springenberg

,

,

Michael Bloesch

,

,

,

,

,

,

Martin A. Riedmiller

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Building Math Agents with Multi-Turn Iterative Preference Learning.

[DOI]

,

,

,

,

,

Daniele Calandriello

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

RRM: Robust Reward Model Training Mitigates Reward Hacking.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Anastasia Makarova

,

Jeremiah Zhe Liu

,

,

,

Abe Ittycheriah

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Preference Optimization as Probabilistic Inference.

[DOI]

Abbas Abdolmaleki

,

,

Bobak Shahriari

,

Jost Tobias Springenberg

,

,

,

,

Michael Bloesch

,

,

,

,

Martin A. Riedmiller

CoRR, 2024

RRM: Robust Reward Model Training Mitigates Reward Hacking.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Anastasiia Makarova

,

Jeremiah Z. Liu

,

,

,

Abe Ittycheriah

,

,

CoRR, 2024

Offline Regularised Reinforcement Learning for Large Language Models Alignment.

[DOI]

Pierre Harvey Richemond

,

,

,

Daniele Calandriello

,

Mohammad Gheshlaghi Azar

,

Rafael Rafailov

,

Bernardo Ávila Pires

,

Eugene Tarassov

,

,

,

Aliaksei Severyn

,

Jonathan Mallinson

,

,

,

,

,

,

CoRR, 2024

Multi-turn Reinforcement Learning from Preference Human Feedback.

[DOI]

,

,

,

,

Daniele Calandriello

,

,

,

,

,

,

Avinatan Hassidim

,

,

CoRR, 2024

Direct Language Model Alignment from Online AI Feedback.

[DOI]

,

,

,

,

,

Felipe Llinares

,

Alexandre Ramé

,

,

,

,

,

Mathieu Blondel

CoRR, 2024

Multi-turn Reinforcement Learning with Preference Human Feedback.

[DOI]

,

,

,

,

Daniele Calandriello

,

,

,

,

,

,

Avinatan Hassidim

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment.

[DOI]

,

Zhaohan Daniel Guo

,

,

Daniele Calandriello

,

,

,

Pierre Harvey Richemond

,

,

Bernardo Ávila Pires

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Nash Learning from Human Feedback.

[DOI]

,

,

Daniele Calandriello

,

Mohammad Gheshlaghi Azar

,

,

,

,

,

,

,

,

,

,

,

,

Daniel J. Mankowitz

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Human Alignment of Large Language Models through Online Preference Optimisation.

[DOI]

Daniele Calandriello

,

Zhaohan Daniel Guo

,

,

,

,

Bernardo Ávila Pires

,

Pierre Harvey Richemond

,

Charline Le Lan

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Unlocking the Power of Representations in Long-term Novelty-based Exploration.

[DOI]

,

Steven Kapturowski

,

Daniele Calandriello

,

Charles Blundell

,

Pablo Sprechmann

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

A General Theoretical Paradigm to Understand Learning from Human Preferences.

[DOI]

Mohammad Gheshlaghi Azar

,

Zhaohan Daniel Guo

,

,

,

,

,

Daniele Calandriello

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

Nash Learning from Human Feedback.

[DOI]

,

,

Daniele Calandriello

,

Mohammad Gheshlaghi Azar

,

,

Zhaohan Daniel Guo

,

,

,

,

,

,

,

,

,

Daniel J. Mankowitz

,

,

CoRR, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.

[DOI]

,

Zhaohan Daniel Guo

,

Pierre Harvey Richemond

,

Bernardo Ávila Pires

,

,

,

,

Mohammad Gheshlaghi Azar

,

Charline Le Lan

,

,

András György

,

Shantanu Thakoor

,

,

,

Daniele Calandriello

,

Proceedings of the International Conference on Machine Learning, 2023

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick.

[DOI]

Pierre Harvey Richemond

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

2022

Figure Data for the paper "Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning".

[DOI]

Dataset, October, 2022

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning.

[DOI]

CoRR, 2022

BYOL-Explore: Exploration by Bootstrapped Prediction.

[DOI]

,

Shantanu Thakoor

,

,

Bernardo Ávila Pires

,

Florent Altché

,

Corentin Tallec

,

,

Daniele Calandriello

,

Jean-Bastien Grill

,

,

,

,

Mohammad Gheshlaghi Azar

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Emergent Communication at Scale.

[DOI]

Rahma Chaabouni

,

,

Florent Altché

,

Eugene Tarassov

,

Corentin Tallec

,

,

Kory Wallace Mathewson

,

Olivier Tieleman

,

Angeliki Lazaridou

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Shaking the foundations: delusions in sequence models for interaction and control.

[DOI]

Pedro A. Ortega

,

,

Grégoire Delétang

,

,

Jordi Grau-Moya

,

,

,

,

,

Julien Pérolat

,

,

Corentin Tallec

,

Emilio Parisotto

,

,

,

,

,

Nando de Freitas

,

CoRR, 2021

Geometric Entropic Exploration.

[DOI]

Zhaohan Daniel Guo

,

Mohammad Gheshlaghi Azar

,

,

Shantanu Thakoor

,

,

Bernardo Ávila Pires

,

,

,

,

CoRR, 2021

2020

BYOL works even without batch statistics.

[DOI]

Pierre H. Richemond

,

Jean-Bastien Grill

,

Florent Altché

,

Corentin Tallec

,

,

,

Samuel L. Smith

,

,

,

,

CoRR, 2020

Acme: A Research Framework for Distributed Reinforcement Learning.

[DOI]

CoRR, 2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning.

[DOI]

Jean-Bastien Grill

,

,

Florent Altché

,

Corentin Tallec

,

Pierre H. Richemond

,

Elena Buchatskaya

,

,

Bernardo Ávila Pires

,

,

Mohammad Gheshlaghi Azar

,

,

Koray Kavukcuoglu

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning.

[DOI]

Zhaohan Daniel Guo

,

Bernardo Ávila Pires

,

,

Jean-Bastien Grill

,

Florent Altché

,

,

Mohammad Gheshlaghi Azar

Proceedings of the 37th International Conference on Machine Learning, 2020

Agent57: Outperforming the Atari Human Benchmark.

[DOI]

Adrià Puigdomènech Badia

,

,

Steven Kapturowski

,

Pablo Sprechmann

,

Alex Vitvitskyi

,

Zhaohan Daniel Guo

,

Charles Blundell

Proceedings of the 37th International Conference on Machine Learning, 2020

Never Give Up: Learning Directed Exploration Strategies.

[DOI]

Adrià Puigdomènech Badia

,

Pablo Sprechmann

,

Alex Vitvitskyi

,

Zhaohan Daniel Guo

,

,

Steven Kapturowski

,

Olivier Tieleman

,

Martín Arjovsky

,

Alexander Pritzel

,

,

Charles Blundell

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

World Discovery Models.

[DOI]

Mohammad Gheshlaghi Azar

,

,

Bernardo A. Pires

,

Jean-Bastien Grill

,

Florent Altché

,

CoRR, 2019

Hindsight Credit Assignment.

[DOI]

Anna Harutyunyan

,

,

,

Mohammad Gheshlaghi Azar

,

,

,

Hado van Hasselt

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Observational Learning by Reinforcement Learning.

[DOI]

,

,

,

,

Leonard Hasenclever

,

,

Olivier Pietquin

Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

2018

Neural Predictive Belief Representations.

[DOI]

Zhaohan Daniel Guo

,

Mohammad Gheshlaghi Azar

,

,

Bernardo A. Pires

,

,

CoRR, 2018

Playing the Game of Universal Adversarial Perturbations.

[DOI]

Julien Pérolat

,

Mateusz Malinowski

,

,

Olivier Pietquin

CoRR, 2018

Observe and Look Further: Achieving Consistent Performance on Atari.

[DOI]

,

,

,

Mohammad Gheshlaghi Azar

,

,

,

Gabriel Barth-Maron

,

Hado van Hasselt

,

,

,

,

,

Olivier Pietquin

CoRR, 2018

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning.

[DOI]

Audrunas Gruslys

,

,

Mohammad Gheshlaghi Azar

,

,

Marc G. Bellemare

,

Proceedings of the 6th International Conference on Learning Representations, 2018

Noisy Networks For Exploration.

[DOI]

Meire Fortunato

,

Mohammad Gheshlaghi Azar

,

,

,

,

,

,

,

,

,

Olivier Pietquin

,

Charles Blundell

,

Proceedings of the 6th International Conference on Learning Representations, 2018

Actor-Critic Fictitious Play in Simultaneous Move Multistage Games.

[DOI]

Julien Pérolat

,

,

Olivier Pietquin

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Deep Q-learning From Demonstrations.

[DOI]

,

,

Olivier Pietquin

,

,

,

,

,

,

Andrew Sendonaris

,

,

Gabriel Dulac-Arnold

,

John P. Agapiou

,

,

Audrunas Gruslys

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Rainbow: Combining Improvements in Deep Reinforcement Learning.

[DOI]

,

,

Hado van Hasselt

,

,

Georg Ostrovski

,

,

,

,

Mohammad Gheshlaghi Azar

,

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning.

[DOI]

,

,

Olivier Pietquin

IEEE Trans. Neural Networks Learn. Syst., 2017

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards.

[DOI]

,

,

Jonathan Scholz

,

,

Olivier Pietquin

,

,

,

Thomas Rothörl

,

,

Martin A. Riedmiller

CoRR, 2017

Learning from Demonstrations for Real World Reinforcement Learning.

[DOI]

,

,

Olivier Pietquin

,

,

,

,

Andrew Sendonaris

,

Gabriel Dulac-Arnold

,

,

John P. Agapiou

,

,

Audrunas Gruslys

CoRR, 2017

Noisy Networks for Exploration.

[DOI]

Meire Fortunato

,

Mohammad Gheshlaghi Azar

,

,

,

,

,

,

,

,

Olivier Pietquin

,

Charles Blundell

,

CoRR, 2017

Observational Learning by Reinforcement Learning.

[DOI]

,

,

,

Olivier Pietquin

CoRR, 2017

Is the Bellman residual a bad proxy?

[DOI]

,

,

Olivier Pietquin

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

End-to-end optimization of goal-driven and visually grounded dialogue systems.

[DOI]

,

,

,

,

Aaron C. Courville

,

Olivier Pietquin

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Learning Nash Equilibrium for General-Sum Markov Games from Batch Data.

[DOI]

Julien Pérolat

,

,

,

Olivier Pietquin

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016

Difference of Convex Functions Programming Applied to Control with Expert Data.

[DOI]

,

,

Olivier Pietquin

CoRR, 2016

Should one minimize the expected Bellman residual or maximize the mean value?

[DOI]

,

,

Olivier Pietquin

CoRR, 2016

Softened Approximate Policy Iteration for Markov Games.

[DOI]

Julien Pérolat

,

,

,

,

Olivier Pietquin

Proceedings of the 33nd International Conference on Machine Learning, 2016

Score-based Inverse Reinforcement Learning.

[DOI]

,

,

,

,

Olivier Pietquin

Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, 2016

On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games.

[DOI]

Julien Pérolat

,

,

,

Olivier Pietquin

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015

Inverse Reinforcement Learning in Relational Domains.

[DOI]

,

,

,

Olivier Pietquin

,

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Imitation Learning Applied to Embodied Conversational Agents.

[DOI]

,

Olivier Pietquin

,

Proceedings of the 4th Workshop on Machine Learning for Interactive Systems, 2015

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games.

[DOI]

Julien Pérolat

,

,

,

Olivier Pietquin

Proceedings of the 32nd International Conference on Machine Learning, 2015

2014

Boosted Bellman Residual Minimization Handling Expert Demonstrations.

[DOI]

,

,

Olivier Pietquin

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2014

Difference of Convex Functions Programming for Reinforcement Learning.

[DOI]

,

,

Olivier Pietquin

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Predicting when to laugh with structured classification.

[DOI]

,

Olivier Pietquin

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Boosted and reward-regularized classification for apprenticeship learning.

[DOI]

,

,

Olivier Pietquin

Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2014

2013

Classification structurée pour l'apprentissage par renforcement inverse.

[DOI]

,

,

,

Olivier Pietquin

Rev. d'Intelligence Artif., 2013

Learning from Demonstrations: Is It Worth Estimating a Reward Function?

[DOI]

,

,

Olivier Pietquin

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2013

A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning.

[DOI]

,

,

,

Olivier Pietquin

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2013

Laugh When You're Winning.

[DOI]

Maurizio Mancini

,

,

Emeline Bantegnie

,

,

Nadia Berthouze

,

Debajyoti Datta

,

,

Stéphane Dupont

,

Harry J. Griffin

,

Florian Lingenfelser

,

Radoslaw Niewiadomski

,

Catherine Pelachaud

,

Olivier Pietquin

,

,

Jérôme Urbain

,

Gualtiero Volpe

,

Johannes Wagner

Proceedings of the Innovative and Creative Developments in Multimodal Interaction Systems, 2013

Laugh-aware virtual agent and its impact on user amusement.

[DOI]

Radoslaw Niewiadomski

,

Jennifer Hofmann

,

Jérôme Urbain

,

,

Johannes Wagner

,

,

Hüseyin Çakmak

,

,

,

Stéphane Dupont

,

,

Florian Lingenfelser

,

,

Olivier Pietquin

,

Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2013

2012

Inverse Reinforcement Learning through Structured Classification.

[DOI]

,

,

,

Olivier Pietquin

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Loading...