Rafael Rafailov

According to our database¹, Rafael Rafailov authored at least 42 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Reliable and Responsible Foundation Models: A Comprehensive Survey.

[BibT_eX]

[DOI]

CoRR, February, 2026

LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing.

[BibT_eX]

[DOI]

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics, 2026

2025

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, June, 2025

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2025

MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation.

[BibT_eX]

[DOI]

CoRR, February, 2025

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought.

[BibT_eX]

[DOI]

CoRR, January, 2025

Reliable and Responsible Foundation Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

MJ-Video: Benchmarking and Rewarding Video Generation with Fine-Grained Video Preference.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Collapse or Thrive: Perils and Promises of Synthetic Data in a Self-Generating World.

[BibT_eX]

[DOI]

Joshua Kazdan

Rylan Schaeffer

Apratim Dey

Matthias Gerstgrasser

Rafael Rafailov

David L. Donoho

Sanmi Koyejo

Proceedings of the Forty-second International Conference on Machine Learning, 2025

PERSONA: A Reproducible Testbed for Pluralistic Alignment.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

2024

Generative Reward Models.

[BibT_eX]

[DOI]

CoRR, 2024

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents.

[BibT_eX]

[DOI]

Pranav Putta

Edmund Mills

Naman Garg

Sumeet Ramesh Motwani

Chelsea Finn

Divyansh Garg

Rafael Rafailov

CoRR, 2024

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

[BibT_eX]

[DOI]

CoRR, 2024

OpenVLA: An Open-Source Vision-Language-Action Model.

[BibT_eX]

[DOI]

CoRR, 2024

Scalable Ensembling For Mitigating Reward Overoptimisation.

[BibT_eX]

[DOI]

CoRR, 2024

Offline Regularised Reinforcement Learning for Large Language Models Alignment.

[BibT_eX]

[DOI]

Pierre Harvey Richemond

Yunhao Tang

Daniel Guo

Daniele Calandriello

Mohammad Gheshlaghi Azar

CoRR, 2024

From <i>r</i> to Q<sup>*</sup>: Your Language Model is Secretly a Q-Function.

[BibT_eX]

[DOI]

CoRR, 2024

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data.

[BibT_eX]

[DOI]

Matthias Gerstgrasser

CoRR, 2024

Aligning Modalities in Vision Large Language Models via Preference Fine-tuning.

[BibT_eX]

[DOI]

CoRR, 2024

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Philippe Hansen-Estruch

RLJ, 2024

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Efficient imitation learning with conservative world models.

[BibT_eX]

[DOI]

Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 2024

Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration.

[BibT_eX]

[DOI]

Henrik I. Christensen

Keerthana Gopalakrishnan

Lawrence Yunliang Chen

Nur Muhammad (Mahi) Shafiullah

Roberto Martín-Martín

Samuel Bustamante-Gomez

Subramanian Ramamoorthy

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Language Model Detectors Are Easily Optimized Against.

[BibT_eX]

[DOI]

Christopher D. Manning

Chelsea Finn

Stefano Ermon

Proceedings of the Twelfth International Conference on Learning Representations, 2024

An Emulator for Fine-tuning Large Language Models using Small Language Models.

[BibT_eX]

[DOI]

Christopher D. Manning

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Contrastive Preference Learning: Learning from Human Feedback without Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Diffusion Model Alignment Using Direct Preference Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OpenVLA: An Open-Source Vision-Language-Action Model.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

Disentangling Length from Quality in Direct Preference Optimization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Contrastive Preference Learning: Learning from Human Feedback without RL.

[BibT_eX]

[DOI]

CoRR, 2023

Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias.

[BibT_eX]

[DOI]

CoRR, 2023

Direct Preference Optimization: Your Language Model is Secretly a Reward Model.

[BibT_eX]

[DOI]

Rafael Rafailov

Archit Sharma

Eric Mitchell

Christopher D. Manning

Stefano Ermon

Chelsea Finn

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Contrastive Example-Based Control.

[BibT_eX]

[DOI]

Proceedings of the Learning for Dynamics and Control Conference, 2023

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback.

[BibT_eX]

[DOI]

Christopher D. Manning

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2023

2022

Vision-Based Manipulators Need to Also See from Their Hands.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

COMBO: Conservative Offline Model-Based Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Visual Adversarial Imitation Learning using Variational Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Offline Reinforcement Learning from Images with Latent Space Models.

[BibT_eX]

[DOI]

Proceedings of the 3rd Annual Conference on Learning for Dynamics and Control, 2021

Offline Meta-Reinforcement Learning with Advantage Weighting.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Rafael Rafailov

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...