Yuhao Zhou

Orcid: 0009-0008-8665-3999

Affiliations:
  • Fudan University, Shanghai, China


According to our database1, Yuhao Zhou authored at least 39 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning.
CoRR, May, 2026

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning.
CoRR, April, 2026

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress.
Proceedings of the ACM Web Conference 2026, 2026

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress.
CoRR, November, 2025

FlowSearch: Advancing deep research with dynamic structured knowledge flow.
CoRR, October, 2025

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines.
CoRR, September, 2025

VeriGUI: Verifiable Long-Chain GUI Dataset.
CoRR, August, 2025

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.
CoRR, July, 2025

Reinforcement Fine-Tuning Enables MLLMs Learning Novel Tasks Stably.
CoRR, June, 2025

Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction.
CoRR, June, 2025

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning.
CoRR, June, 2025

MSEarth: A Benchmark for Multimodal Scientific Comprehension of Earth Science.
CoRR, May, 2025

EarthSE: A Benchmark for Evaluating Earth Scientific Exploration Capability of LLMs.
CoRR, May, 2025

EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection.
CoRR, March, 2025

The rise and potential of large language model based agents: a survey.
Sci. China Inf. Sci., 2025

Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024
CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection.
Proc. ACM Softw. Eng., 2024

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study.
CoRR, 2024

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback.
CoRR, 2024

MouSi: Poly-Visual-Expert Vision-Language Models.
CoRR, 2024

Secrets of RLHF in Large Language Models Part II: Reward Modeling.
CoRR, 2024

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Improving Generalization of Alignment with Human Preferences through Group Invariant Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Reward Modeling Requires Automatic Adjustment Based on Data Quality.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Improving Discriminative Capability of Reward Models in RLHF Using Contrastive Learning.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

ORTicket: Let One Robust BERT Ticket Transfer across Different Tasks.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models via MoE-Style Plugin.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

StepCoder: Improving Code Generation with Reinforcement Learning from Compiler Feedback.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment.
CoRR, 2023

The Rise and Potential of Large Language Model Based Agents: A Survey.
CoRR, 2023

Secrets of RLHF in Large Language Models Part I: PPO.
CoRR, 2023

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement.
CoRR, 2023

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Detecting Adversarial Samples through Sharpness of Loss Landscape.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Robust Lottery Tickets for Pre-trained Language Models.
CoRR, 2022

Robust Lottery Tickets for Pre-trained Language Models.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022


  Loading...