Bilal Piot

Orcid: 0000-0002-6456-7183

Affiliations:
  • Lille University of Science and Technology, Research center in Computer Science, Signal and Automatic Control (CRIStAL)


According to our database1, Bilal Piot authored at least 56 papers between 2012 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Human Alignment of Large Language Models through Online Preference Optimisation.
CoRR, 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment.
CoRR, 2024

Direct Language Model Alignment from Online AI Feedback.
CoRR, 2024

2023
Nash Learning from Human Feedback.
CoRR, 2023

A General Theoretical Paradigm to Understand Learning from Human Preferences.
CoRR, 2023

Unlocking the Power of Representations in Long-term Novelty-based Exploration.
CoRR, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2023

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick.
Proceedings of the International Conference on Machine Learning, 2023

2022
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning.
CoRR, 2022

BYOL-Explore: Exploration by Bootstrapped Prediction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Emergent Communication at Scale.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Shaking the foundations: delusions in sequence models for interaction and control.
CoRR, 2021

Geometric Entropic Exploration.
CoRR, 2021

2020
BYOL works even without batch statistics.
CoRR, 2020

Acme: A Research Framework for Distributed Reinforcement Learning.
CoRR, 2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

Agent57: Outperforming the Atari Human Benchmark.
Proceedings of the 37th International Conference on Machine Learning, 2020

Never Give Up: Learning Directed Exploration Strategies.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
World Discovery Models.
CoRR, 2019

Hindsight Credit Assignment.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Observational Learning by Reinforcement Learning.
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

2018
Neural Predictive Belief Representations.
CoRR, 2018

Playing the Game of Universal Adversarial Perturbations.
CoRR, 2018

Observe and Look Further: Achieving Consistent Performance on Atari.
CoRR, 2018

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning.
Proceedings of the 6th International Conference on Learning Representations, 2018

Noisy Networks For Exploration.
Proceedings of the 6th International Conference on Learning Representations, 2018

Actor-Critic Fictitious Play in Simultaneous Move Multistage Games.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Deep Q-learning From Demonstrations.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Rainbow: Combining Improvements in Deep Reinforcement Learning.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning.
IEEE Trans. Neural Networks Learn. Syst., 2017

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards.
CoRR, 2017

Learning from Demonstrations for Real World Reinforcement Learning.
CoRR, 2017

Noisy Networks for Exploration.
CoRR, 2017

Observational Learning by Reinforcement Learning.
CoRR, 2017

Is the Bellman residual a bad proxy?
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

End-to-end optimization of goal-driven and visually grounded dialogue systems.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Learning Nash Equilibrium for General-Sum Markov Games from Batch Data.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016
Difference of Convex Functions Programming Applied to Control with Expert Data.
CoRR, 2016

Should one minimize the expected Bellman residual or maximize the mean value?
CoRR, 2016

Softened Approximate Policy Iteration for Markov Games.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Score-based Inverse Reinforcement Learning.
Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, 2016

On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015
Inverse Reinforcement Learning in Relational Domains.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Imitation Learning Applied to Embodied Conversational Agents.
Proceedings of the 4th Workshop on Machine Learning for Interactive Systems, 2015

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games.
Proceedings of the 32nd International Conference on Machine Learning, 2015

2014
Boosted Bellman Residual Minimization Handling Expert Demonstrations.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2014

Difference of Convex Functions Programming for Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Predicting when to laugh with structured classification.
Proceedings of the INTERSPEECH 2014, 2014

Boosted and reward-regularized classification for apprenticeship learning.
Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2014

2013
Classification structurée pour l'apprentissage par renforcement inverse.
Rev. d'Intelligence Artif., 2013

Learning from Demonstrations: Is It Worth Estimating a Reward Function?
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2013

A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2013


Laugh-aware virtual agent and its impact on user amusement.
Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2013

2012
Inverse Reinforcement Learning through Structured Classification.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012


  Loading...