We stand with Ukraine

We stand with Ukraine

Olivier Delalleau

According to our database¹, Olivier Delalleau authored at least 36 papers between 2003 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Think Twice: Branch-and-Rethink Reasoning Reward Model.

[BibT_eX]

[DOI]

,

,

Julien Veron Vialard

,

Oleksii Kuchaiev

,

,

Olivier Delalleau

CoRR, October, 2025

RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards.

[BibT_eX]

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

,

Oleksii Kuchaiev

CoRR, September, 2025

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages.

[BibT_eX]

[DOI]

,

,

Olivier Delalleau

,

,

,

Alexander Bukharin

,

,

,

Oleksii Kuchaiev

CoRR, May, 2025

Adversarial Training of Reward Models.

[BibT_eX]

[DOI]

Alexander Bukharin

,

,

,

Adithya Renduchintala

,

,

,

Oleksii Kuchaiev

,

Olivier Delalleau

,

CoRR, April, 2025

Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks.

[BibT_eX]

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

,

Oleksii Kuchaiev

CoRR, March, 2025

Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment.

[BibT_eX]

[DOI]

,

,

Alexander Bukharin

,

David Mosallanezhad

,

,

,

,

Adithya Renduchintala

,

,

,

,

Dmitry Chichkov

,

Olivier Delalleau

,

Oleksii Kuchaiev

CoRR, February, 2025

Diverging Preferences: When do Annotators Disagree and do Models Know?

[BibT_eX]

[DOI]

Michael J. Q. Zhang

,

,

,

,

Olivier Delalleau

,

,

,

,

Valentina Pyatkin

Proceedings of the Forty-second International Conference on Machine Learning, 2025

HelpSteer2-Preference: Complementing Ratings with Preferences.

[BibT_eX]

[DOI]

,

Alexander Bukharin

,

Olivier Delalleau

,

,

,

,

Oleksii Kuchaiev

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks.

[BibT_eX]

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

,

Oleksii Kuchaiev

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Nemotron-4 340B Technical Report.

[BibT_eX]

[DOI]

,

,

,

,

Pallab Bhattacharya

,

,

,

Bryan Catanzaro

,

,

Jonathan M. Cohen

,

,

Ayush Dattagupta

,

Olivier Delalleau

,

Leon Derczynski

,

,

,

,

Aleksander Ficek

,

,

,

,

,

Tomasz Grzegorzek

,

,

,

,

Joseph Jennings

,

Aastha Jhunjhunwala

,

,

,

Oleksii Kuchaiev

,

Patrick LeGresley

,

,

,

,

,

Ameya Sunil Mahabaleshwarkar

,

Somshubra Majumdar

,

,

Miguel Martinez

,

Maer Rodrigues de Melo

,

,

Deepak Narayanan

,

Sean Narenthiran

,

,

,

,

,

Guruprasad Nutheti

,

Christopher Parisien

,

Jupinder Parmar

,

Mostofa Patwary

,

Krzysztof Pawelec

,

,

Shrimai Prabhumoye

,

,

,

Vasanth Rao Naik Sabavat

,

Sanjeev Satheesh

,

Jane Polak Scowcroft

,

,

,

,

Mohammad Shoeybi

,

,

Misha Smelyanskiy

,

,

Makesh Narsimhan Sreedhar

,

,

Sandeep Subramanian

,

,

Shubham Toshniwal

,

,

,

,

,

,

,

,

,

CoRR, 2024

HelpSteer2: Open-source dataset for training top-performing reward models.

[BibT_eX]

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

Makesh Narsimhan Sreedhar

,

Oleksii Kuchaiev

CoRR, 2024

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment.

[BibT_eX]

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

,

,

Ali Taghibakhshi

,

Markel Sanz Ausin

,

,

Oleksii Kuchaiev

CoRR, 2024

HelpSteer 2: Open-source dataset for training top-performing reward models.

[BibT_eX]

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

Makesh Narsimhan Sreedhar

,

Oleksii Kuchaiev

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM.

[BibT_eX]

[DOI]

,

,

,

,

Makesh Narsimhan Sreedhar

,

,

Olivier Delalleau

,

Jane Polak Scowcroft

,

,

,

Oleksii Kuchaiev

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Olivier Delalleau

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

2020

A Closer Look at Codistillation for Distributed Training.

[BibT_eX]

[DOI]

,

Olivier Delalleau

,

,

,

,

Michael G. Rabbat

CoRR, 2020

2019

Discrete and Continuous Action Representation for Practical RL in Video Games.

[BibT_eX]

[DOI]

Olivier Delalleau

,

,

,

CoRR, 2019

2016

Theano: A Python framework for fast computation of mathematical expressions.

[BibT_eX]

[DOI]

,

Guillaume Alain

,

Amjad Almahairi

,

Christof Angermüller

,

Dzmitry Bahdanau

,

,

Frédéric Bastien

,

,

Anatoly Belikov

,

Alexander Belopolsky

,

,

Arnaud Bergeron

,

,

Valentin Bisson

,

Josh Bleecher Snyder

,

Nicolas Bouchard

,

Nicolas Boulanger-Lewandowski

,

Xavier Bouthillier

,

Alexandre de Brébisson

,

Olivier Breuleux

,

Pierre Luc Carrier

,

,

,

Paul F. Christiano

,

,

Marc-Alexandre Côté

,

,

Aaron C. Courville

,

Yann N. Dauphin

,

Olivier Delalleau

,

,

Guillaume Desjardins

,

Sander Dieleman

,

,

Melanie Ducoffe

,

Vincent Dumoulin

,

Samira Ebrahimi Kahou

,

,

,

,

Mathieu Germain

,

,

Ian J. Goodfellow

,

,

Çaglar Gülçehre

,

,

Iban Harlouchet

,

Jean-Philippe Heng

,

,

,

,

Sébastien Jean

,

,

Mikhail Korobov

,

,

,

,

,

,

,

Simon Lefrançois

,

,

Nicholas Léonard

,

,

Jesse A. Livezey

,

,

,

,

Pierre-Antoine Manzagol

,

Olivier Mastropietro

,

Robert McGibbon

,

Roland Memisevic

,

Bart van Merriënboer

,

Vincent Michalski

,

,

Alberto Orlandi

,

Christopher Joseph Pal

,

,

Mohammad Pezeshki

,

,

,

Matthew Rocklin

,

,

,

,

,

François Savard

,

,

,

Gabriel Schwartz

,

Iulian Vlad Serban

,

Dmitriy Serdyuk

,

Samira Shabanian

,

,

Sigurd Spieckermann

,

S. Ramana Subramanyam

,

Jakub Sygnowski

,

Jérémie Tanguay

,

Gijs van Tulder

,

Joseph P. Turian

,

Sebastian Urban

,

,

Francesco Visin

,

,

David Warde-Farley

,

,

Matthew Willson

,

,

,

,

,

CoRR, 2016

2013

Stacked calibration of off-policy policy evaluation for video game matchmaking.

[BibT_eX]

[DOI]

Eric Thibodeau-Laufer

,

Raul Chandias Ferrari

,

,

Olivier Delalleau

,

Proceedings of the 2013 IEEE Conference on Computational Inteligence in Games (CIG), 2013

2012

Beyond Skill Rating: Advanced Matchmaking in Ghost Recon Online.

[BibT_eX]

[DOI]

Olivier Delalleau

,

,

Eric Thibodeau-Laufer

,

Raul Chandias Ferrari

,

,

IEEE Trans. Comput. Intell. AI Games, 2012

Efficient EM Training of Gaussian Mixtures with Missing Data

[BibT_eX]

[DOI]

Olivier Delalleau

,

Aaron C. Courville

,

CoRR, 2012

Detonation Classification from acoustic Signature with the Restricted Boltzmann Machine.

[BibT_eX]

[DOI]

,

Nicolas Chapados

,

Olivier Delalleau

,

Hugo Larochelle

,

Xavier Saint-Mleux

,

Christian Hudon

,

Jérôme Louradour

Comput. Intell., 2012

2011

Shallow vs. Deep Sum-Product Networks.

[BibT_eX]

[DOI]

Olivier Delalleau

,

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

On the Expressive Power of Deep Architectures.

[BibT_eX]

[DOI]

,

Olivier Delalleau

Proceedings of the Algorithmic Learning Theory - 22nd International Conference, 2011

2010

Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines.

[BibT_eX]

[DOI]

Guillaume Desjardins

,

Aaron C. Courville

,

,

,

Olivier Delalleau

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Decision trees do not generalize to new variations.

[BibT_eX]

[DOI]

,

Olivier Delalleau

,

Clarence Simard

Comput. Intell., 2010

2009

Justifying and Generalizing Contrastive Divergence.

[BibT_eX]

[DOI]

,

Olivier Delalleau

Neural Comput., 2009

2006

Spectral Dimensionality Reduction.

[BibT_eX]

[DOI]

,

Olivier Delalleau

,

Nicolas Le Roux

,

Jean-François Paiement

,

,

Proceedings of the Feature Extraction - Foundations and Applications, 2006

Large-Scale Algorithms.

[BibT_eX]

[DOI]

Olivier Delalleau

,

,

Nicolas Le Roux

Proceedings of the Semi-Supervised Learning, 2006

Label Propagation and Quadratic Criterion.

[BibT_eX]

[DOI]

,

Olivier Delalleau

,

Nicolas Le Roux

Proceedings of the Semi-Supervised Learning, 2006

2005

Convex Neural Networks.

[BibT_eX]

[DOI]

,

Nicolas Le Roux

,

,

Olivier Delalleau

,

Patrice Marcotte

Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

The Curse of Highly Variable Functions for Local Kernel Machines.

[BibT_eX]

[DOI]

,

Olivier Delalleau

,

Nicolas Le Roux

Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Efficient Non-Parametric Function Induction in Semi-Supervised Learning.

[BibT_eX]

[DOI]

Olivier Delalleau

,

,

Nicolas Le Roux

Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 2005

2004

Learning Eigenfunctions Links Spectral Embedding and Kernel PCA.

[BibT_eX]

[DOI]

,

Olivier Delalleau

,

Nicolas Le Roux

,

Jean-François Paiement

,

,

Neural Comput., 2004

Locally Linear Embedding for dimensionality reduction in QSAR.

[BibT_eX]

[DOI]

Pierre-Jean L'Heureux

,

,

,

Olivier Delalleau

,

J. Comput. Aided Mol. Des., 2004

2003

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering.

[BibT_eX]

[DOI]

,

Jean-François Paiement

,

,

Olivier Delalleau

,

Nicolas Le Roux

,

Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Loading...