Yilun Zhao
Orcid: 0000-0002-7470-6124Affiliations:
- Yale University, New Haven, CT, USA
- Zhejiang University, Hangzhou, China (former)
According to our database1,
Yilun Zhao
authored at least 83 papers
between 2020 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks.
CoRR, July, 2025
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation.
CoRR, June, 2025
Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure.
CoRR, June, 2025
SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing.
CoRR, June, 2025
CoRR, May, 2025
CoRR, March, 2025
Experience Retrieval-Augmentation with Electronic Health Records Enables Accurate Discharge QA.
CoRR, March, 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning.
CoRR, March, 2025
CoRR, January, 2025
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning.
CoRR, January, 2025
Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the Findings of the Association for Computational Linguistics, 2025
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
2024
<tt>L2CEval</tt>: Evaluating Language-to-Code Generation Capabilities of Large Language Models.
Trans. Assoc. Comput. Linguistics, 2024
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation.
CoRR, 2024
FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents.
CoRR, 2024
CoRR, 2024
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models.
CoRR, 2024
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data?
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024
On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
FinDVer: Explainable Claim Verification over Long and Hybrid-content Financial Documents.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Unveiling the Spectrum of Data Contamination in Language Model: A Survey from Detection to Remediation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Financial Documents.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
TaPERA: Enhancing Faithfulness and Interpretability in Long-Form Table QA by Content Planning and Execution-based Reasoning.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
CoRR, 2023
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks.
CoRR, 2023
DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data.
CoRR, 2023
CoRR, 2023
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models.
CoRR, 2023
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
CoRR, 2023
Large Language Models are Effective Table-to-Text Generators, Evaluators, and Feedback Providers.
CoRR, 2023
Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies.
CoRR, 2023
Enhancing Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Investigating Table-to-Text Generation Capabilities of Large Language Models in Real-World Information Seeking Scenarios.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023, 2023
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023
RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
2022
IEEE Trans. Multim., 2022
FinMath: Injecting a Tree-structured Solver for Question Answering over Financial Reports.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
2021
Proceedings of the MultiMedia Modeling - 27th International Conference, 2021
2020
CoRR, 2020