Yi Dong

Affiliations:
  • NVIDIA


According to our database1, Yi Dong authored at least 22 papers between 2023 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents.
CoRR, March, 2026

PhyCritic: Multimodal Critic Models for Physical AI.
CoRR, February, 2026

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text.
CoRR, January, 2026

2025
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration.
CoRR, November, 2025

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge.
CoRR, October, 2025

BroRL: Scaling Reinforcement Learning via Broadened Exploration.
CoRR, October, 2025

RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards.
CoRR, September, 2025

Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training.
CoRR, July, 2025

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models.
CoRR, May, 2025

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages.
CoRR, May, 2025

Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning.
CoRR, May, 2025

Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks.
CoRR, March, 2025

Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment.
CoRR, February, 2025

Diverging Preferences: When do Annotators Disagree and do Models Know?
Proceedings of the Forty-second International Conference on Machine Learning, 2025

HelpSteer2-Preference: Complementing Ratings with Preferences.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
HelpSteer2: Open-source dataset for training top-performing reward models.
CoRR, 2024

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment.
CoRR, 2024

HelpSteer 2: Open-source dataset for training top-performing reward models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

2023
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023


  Loading...