We stand with Ukraine

We stand with Ukraine

Yi Dong

Affiliations:

NVIDIA

According to our database¹, Yi Dong authored at least 23 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Polar: Agentic RL on Any Harness at Scale.

[DOI]

,

,

,

,

,

,

,

,

,

Michael Demoret

,

,

CoRR, May, 2026

ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2026

PhyCritic: Multimodal Critic Models for Physical AI.

[DOI]

,

,

,

,

,

,

,

CoRR, February, 2026

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Prithviraj Ammanabrolu

,

,

,

CoRR, January, 2026

2025

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Evelina Bakhturina

,

,

,

,

Pavlo Molchanov

CoRR, November, 2025

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge.

[DOI]

,

,

,

,

,

,

Pavlo Molchanov

,

,

,

CoRR, October, 2025

BroRL: Scaling Reinforcement Learning via Broadened Exploration.

[DOI]

,

,

,

,

Zaïd Harchaoui

,

,

,

Pavlo Molchanov

,

,

,

CoRR, October, 2025

RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards.

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

,

Oleksii Kuchaiev

CoRR, September, 2025

Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training.

[DOI]

CoRR, July, 2025

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models.

[DOI]

,

,

,

,

,

,

,

CoRR, May, 2025

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages.

[DOI]

,

,

Olivier Delalleau

,

,

,

Alexander Bukharin

,

,

,

Oleksii Kuchaiev

CoRR, May, 2025

Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning.

[DOI]

,

,

,

,

Bryan Catanzaro

,

,

,

,

CoRR, May, 2025

Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks.

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

,

Oleksii Kuchaiev

CoRR, March, 2025

Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment.

[DOI]

,

,

Alexander Bukharin

,

David Mosallanezhad

,

,

,

,

Adithya Renduchintala

,

,

,

,

Dmitry Chichkov

,

Olivier Delalleau

,

Oleksii Kuchaiev

CoRR, February, 2025

Diverging Preferences: When do Annotators Disagree and do Models Know?

[DOI]

Michael J. Q. Zhang

,

,

,

,

Olivier Delalleau

,

,

,

,

Valentina Pyatkin

Proceedings of the Forty-second International Conference on Machine Learning, 2025

HelpSteer2-Preference: Complementing Ratings with Preferences.

[DOI]

,

Alexander Bukharin

,

Olivier Delalleau

,

,

,

,

Oleksii Kuchaiev

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks.

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

,

Oleksii Kuchaiev

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

HelpSteer2: Open-source dataset for training top-performing reward models.

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

Makesh Narsimhan Sreedhar

,

Oleksii Kuchaiev

CoRR, 2024

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment.

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

,

,

Ali Taghibakhshi

,

Markel Sanz Ausin

,

,

Oleksii Kuchaiev

CoRR, 2024

HelpSteer 2: Open-source dataset for training top-performing reward models.

[DOI]

,

,

Olivier Delalleau

,

,

,

,

,

Makesh Narsimhan Sreedhar

,

Oleksii Kuchaiev

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM.

[DOI]

,

,

,

,

Makesh Narsimhan Sreedhar

,

,

Olivier Delalleau

,

Jane Polak Scowcroft

,

,

,

Oleksii Kuchaiev

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

2023

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study.

[DOI]

,

,

,

Lawrence McAfee

,

,

Mohammad Shoeybi

,

,

Oleksii Kuchaiev

,

,

,

Anima Anandkumar

,

Bryan Catanzaro

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF.

[DOI]

,

,

Makesh Narsimhan Sreedhar

,

,

Oleksii Kuchaiev

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Loading...