Zaiyuan Wang

According to our database¹, Zaiyuan Wang authored at least 10 papers between 2025 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation.

[BibT_eX]

[DOI]

CoRR, April, 2026

$OneMillion-Bench: How Far are Language Agents from Human Experts?

[BibT_eX]

[DOI]

CoRR, March, 2026

WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics.

[BibT_eX]

[DOI]

CoRR, December, 2025

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents.

[BibT_eX]

[DOI]

CoRR, December, 2025

LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation.

[BibT_eX]

[DOI]

CoRR, November, 2025

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

[BibT_eX]

[DOI]

CoRR, September, 2025

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction.

[BibT_eX]

[DOI]

CoRR, August, 2025

ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Zaiyuan Wang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...