Binghai Wang

According to our database1, Binghai Wang authored at least 12 papers between 2023 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training.
CoRR, April, 2026

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning.
CoRR, April, 2026

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning.
CoRR, March, 2026

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models.
CoRR, February, 2026

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress.
Proceedings of the ACM Web Conference 2026, 2026

2025
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress.
CoRR, November, 2025

WorldPM: Scaling Human Preference Modeling.
CoRR, May, 2025

RMB: Comprehensively benchmarking reward models in LLM alignment.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling.
CoRR, 2024

Reward Modeling Requires Automatic Adjustment Based on Data Quality.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Improving Discriminative Capability of Reward Models in RLHF Using Contrastive Learning.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
Secrets of RLHF in Large Language Models Part I: PPO.
CoRR, 2023


  Loading...