Zhoufutu Wen

Orcid: 0009-0000-0894-5824

According to our database¹, Zhoufutu Wen authored at least 26 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors.

[BibT_eX]

[DOI]

CoRR, April, 2026

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation.

[BibT_eX]

[DOI]

CoRR, April, 2026

CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs.

[BibT_eX]

[DOI]

CoRR, March, 2026

WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains.

[BibT_eX]

[DOI]

CoRR, November, 2025

MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity.

[BibT_eX]

[DOI]

CoRR, November, 2025

COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes.

[BibT_eX]

[DOI]

CoRR, October, 2025

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures.

[BibT_eX]

[DOI]

CoRR, October, 2025

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning.

[BibT_eX]

[DOI]

CoRR, September, 2025

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling.

[BibT_eX]

[DOI]

CoRR, August, 2025

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction.

[BibT_eX]

[DOI]

CoRR, August, 2025

First Return, Entropy-Eliciting Explore.

[BibT_eX]

[DOI]

CoRR, July, 2025

SciDA: Scientific Dynamic Assessor of LLMs.

[BibT_eX]

[DOI]

CoRR, June, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.

[BibT_eX]

[DOI]

CoRR, May, 2025

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, April, 2025

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2025

CryptoX : Compositional Reasoning Evaluation of Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2025

Distillation Quantification for Large Language Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

SUMMA: A Multimodal Large Language Model for Advertisement Summarization.

[BibT_eX]

[DOI]

Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025

Quantification of Large Language Model Distillation.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Enhancing Dynamic Image Advertising with Vision-Language Pre-training.

[BibT_eX]

[DOI]

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Zhoufutu Wen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...