Yifan Mai

Orcid: 0009-0004-7270-2607

Affiliations:
  • Stanford University, CA, USA


According to our database1, Yifan Mai authored at least 17 papers between 2023 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
The Singapore Consensus on Global AI Safety Research Priorities.
CoRR, June, 2025

MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks.
CoRR, May, 2025

Judging LLMs on a Simplex.
CoRR, May, 2025

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, March, 2025

The Mighty ToRR: A Benchmark for Table Reasoning and Robustness.
CoRR, February, 2025

Evaluating Large Language Models with Enterprise Benchmarks.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

AutoBencher: Towards Declarative Benchmark Construction.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SEA-HELM: Southeast Asian Holistic Evaluation of Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Language model developers should report train-test overlap.
CoRR, 2024

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies.
CoRR, 2024

Introducing v0.5 of the AI Safety Benchmark from MLCommons.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, 2024

Image2Struct: Benchmarking Structure Extraction for Vision-Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VHELM: A Holistic Evaluation of Vision Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Using Benchmarking Infrastructure to Evaluate LLM Performance on CS Concept Inventories: Challenges, Opportunities, and Critiques.
Proceedings of the 2024 ACM Conference on International Computing Education Research, 2024

2023
Holistic Evaluation of Language Models.
Trans. Mach. Learn. Res., 2023

Holistic Evaluation of Text-to-Image Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023


  Loading...