Binghai Wang

According to our database¹, Binghai Wang authored at least 12 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training.

[BibT_eX]

[DOI]

CoRR, April, 2026

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, April, 2026

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning.

[BibT_eX]

[DOI]

CoRR, March, 2026

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress.

[BibT_eX]

[DOI]

Proceedings of the ACM Web Conference 2026, 2026

2025

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress.

[BibT_eX]

[DOI]

CoRR, November, 2025

WorldPM: Scaling Human Preference Modeling.

[BibT_eX]

[DOI]

CoRR, May, 2025

RMB: Comprehensively benchmarking reward models in LLM alignment.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Secrets of RLHF in Large Language Models Part II: Reward Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

Reward Modeling Requires Automatic Adjustment Based on Data Quality.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Improving Discriminative Capability of Reward Models in RLHF Using Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

Secrets of RLHF in Large Language Models Part I: PPO.

[BibT_eX]

[DOI]

CoRR, 2023

Binghai Wang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...