Yuhao Zhou

Orcid: 0009-0008-8665-3999

Affiliations:

Fudan University, Shanghai, China

According to our database¹, Yuhao Zhou authored at least 39 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2026

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, April, 2026

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress.

[BibT_eX]

[DOI]

Proceedings of the ACM Web Conference 2026, 2026

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress.

[BibT_eX]

[DOI]

CoRR, November, 2025

FlowSearch: Advancing deep research with dynamic structured knowledge flow.

[BibT_eX]

[DOI]

CoRR, October, 2025

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines.

[BibT_eX]

[DOI]

CoRR, September, 2025

VeriGUI: Verifiable Long-Chain GUI Dataset.

[BibT_eX]

[DOI]

CoRR, August, 2025

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.

[BibT_eX]

[DOI]

CoRR, July, 2025

Reinforcement Fine-Tuning Enables MLLMs Learning Novel Tasks Stably.

[BibT_eX]

[DOI]

CoRR, June, 2025

Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction.

[BibT_eX]

[DOI]

CoRR, June, 2025

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

MSEarth: A Benchmark for Multimodal Scientific Comprehension of Earth Science.

[BibT_eX]

[DOI]

CoRR, May, 2025

EarthSE: A Benchmark for Evaluating Earth Scientific Exploration Capability of LLMs.

[BibT_eX]

[DOI]

CoRR, May, 2025

EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection.

[BibT_eX]

[DOI]

CoRR, March, 2025

The rise and potential of large language model based agents: a survey.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2025

Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection.

[BibT_eX]

[DOI]

Proc. ACM Softw. Eng., 2024

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study.

[BibT_eX]

[DOI]

CoRR, 2024

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback.

[BibT_eX]

[DOI]

CoRR, 2024

MouSi: Poly-Visual-Expert Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Secrets of RLHF in Large Language Models Part II: Reward Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Improving Generalization of Alignment with Human Preferences through Group Invariant Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Reward Modeling Requires Automatic Adjustment Based on Data Quality.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Improving Discriminative Capability of Reward Models in RLHF Using Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

ORTicket: Let One Robust BERT Ticket Transfer across Different Tasks.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models via MoE-Style Plugin.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

StepCoder: Improving Code Generation with Reinforcement Learning from Compiler Feedback.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment.

[BibT_eX]

[DOI]

CoRR, 2023

The Rise and Potential of Large Language Model Based Agents: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

Secrets of RLHF in Large Language Models Part I: PPO.

[BibT_eX]

[DOI]

CoRR, 2023

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement.

[BibT_eX]

[DOI]

CoRR, 2023

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Detecting Adversarial Samples through Sharpness of Loss Landscape.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Robust Lottery Tickets for Pre-trained Language Models.

[BibT_eX]

[DOI]

CoRR, 2022

Robust Lottery Tickets for Pre-trained Language Models.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Yuhao Zhou

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...