We stand with Ukraine

We stand with Ukraine

Yifan Mai

Orcid: 0009-0004-7270-2607

Affiliations:

Stanford University, CA, USA

According to our database¹, Yifan Mai authored at least 19 papers between 2023 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

On csauthors.net:

Bibliography

2025

Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation.

[BibT_eX]

[DOI]

,

Benedikt Stroebl

,

,

,

Zachary S. Siegel

,

,

,

,

,

,

,

Dheeraj Oruganty

,

,

,

,

,

,

,

,

,

,

,

,

,

Rishi Bommasani

,

,

,

Peter Henderson

,

,

,

Arvind Narayanan

CoRR, October, 2025

AHELM: A Holistic Evaluation of Audio-Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, August, 2025

The Singapore Consensus on Global AI Safety Research Priorities.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Sören Mindermann

,

Vanessa Wilfred

,

Vidhisha Balachandran

,

,

Michael Belinsky

,

,

,

,

,

Duncan Cass-Beggs

,

,

Rumman Chowdhury

,

,

,

,

Agnès Delaborde

,

,

Francisco Eiras

,

,

,

,

,

,

Johannes Heidecke

,

,

,

Bryan Low Kian Hsiang

,

,

,

,

Adam Tauman Kalai

,

Meindert Kamphuis

,

Mohan S. Kankanhalli

,

Subhash Kantamneni

,

Mathias Bonde Kirk

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Seán Ó hÉigeartaigh

,

Alejandro Ortega

,

,

,

Benjamin Prud'homme

,

Reihaneh Rabbany

,

Nayat Sanchez-Pi

,

Sarah Schwettmann

,

,

,

,

,

,

,

William-Chandra Tjhi

,

,

,

Anthony Tung K. H.

,

,

,

,

,

,

HongJiang Zhang

,

Djordje Zikelic

CoRR, June, 2025

MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Leonardo Schettini

,

,

Jason Alan Fries

,

Akshay Swaminathan

,

,

,

,

,

,

,

,

Oluseyi Fayanju

,

,

,

,

,

Eduardo Pontes Reis

,

Sergios Gatidis

,

,

,

Rachna Saralkar

,

Chia-Chun Chiang

,

Jenelle A. Jindal

,

,

,

,

Albert S. Chiou

,

,

,

Michael F. Gensheimer

,

,

,

,

,

,

François Grolleau

,

Kameron C. Black

,

,

Aydin Zahedivash

,

,

Harshita Sharma

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Roxana Daneshjou

,

Jonathan H. Chen

,

Emily Alsentzer

,

,

,

Nima Aghaeepour

,

Vanessa Kennedy

,

Akshay Chaudhari

,

,

,

Matthew P. Lungren

,

,

,

,

CoRR, May, 2025

Judging LLMs on a Simplex.

[BibT_eX]

[DOI]

Patrick Vossler

,

,

,

CoRR, May, 2025

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Kenneth Fricklas

,

,

Quentin Feuillade-Montixi

,

,

Felix Friedrich

,

,

,

,

,

Eleonora Presani

,

Jonathan Bennion

,

Marisa Ferrara Boston

,

,

,

,

Malek Ben Salem

,

,

Sujata S. Goswami

,

,

,

Supheakmungkol Sarin

,

,

,

,

Kashyap Ramanandula Manjusha

,

,

,

,

,

Benjamin Rukundo

,

Abolfazl Shahbazi

,

,

,

Vithursan Thangarasa

,

,

,

,

Satyapriya Krishna

,

Mubashara Akhtar

,

,

,

,

,

Joseph Marvin Imperial

,

,

Sasidhar Kunapuli

,

Nicolas Miailhe

,

Julien Delaunay

,

Bhaktipriya Radharapu

,

,

,

Debojyoti Dutta

,

,

Ananya Gangavarapu

,

,

Agasthya Gangavarapu

,

Patrick Schramowski

,

,

,

,

Priyanka Mary Mammen

,

Tarunima Prabhakar

,

Venelin Kovatchev

,

,

Kelvin N. Manyeki

,

Sandeep Madireddy

,

,

,

Joachim Baumann

,

,

,

,

Jibin Rajan Varghese

,

,

Seshakrishna Jitendar

,

,

Claire V. Hardgrove

,

,

,

,

,

Shachi H. Kumar

,

,

,

,

Sree Bhargavi Balija

,

,

Robert Sullivan

,

,

,

,

,

,

Joaquin Vanschoren

CoRR, March, 2025

The Mighty ToRR: A Benchmark for Table Reasoning and Robustness.

[BibT_eX]

[DOI]

Shir Ashury-Tahan

,

,

,

,

,

,

,

,

,

,

Michal Shmueli-Scheuer

CoRR, February, 2025

Evaluating Large Language Models with Enterprise Benchmarks.

[BibT_eX]

[DOI]

,

,

,

,

Md. Maruf Hossain

,

,

,

,

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

AutoBencher: Towards Declarative Benchmark Construction.

[BibT_eX]

[DOI]

,

,

Evan Zheran Liu

,

,

,

Tatsunori Hashimoto

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories.

[BibT_eX]

[DOI]

,

,

,

Jeffrey Ziwei Tan

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SEA-HELM: Southeast Asian Holistic Evaluation of Language Models.

[BibT_eX]

[DOI]

Yosephine Susanto

,

Adithya Venkatadri Hulagadri

,

Jann Railey Montalan

,

,

,

,

Hamsawardhini Rengarajan

,

Peerat Limkonchotiwat

,

,

William-Chandra Tjhi

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Language model developers should report train-test overlap.

[BibT_eX]

[DOI]

,

,

,

,

,

Rishi Bommasani

,

CoRR, 2024

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies.

[BibT_eX]

[DOI]

,

,

,

Jeffrey Ziwei Tan

,

,

,

,

,

,

,

,

CoRR, 2024

Introducing v0.5 of the AI Safety Benchmark from MLCommons.

[BibT_eX]

[DOI]

,

,

,

Victor Akinwande

,

Namir Al-Nuaimi

,

,

,

,

Trupti Bavalatti

,

Borhane Blili-Hamelin

,

Kurt D. Bollacker

,

Rishi Bomassani

,

Marisa Ferrara Boston

,

,

,

,

,

Zacharie Delpierre Coudert

,

Leon Derczynski

,

Debojyoti Dutta

,

,

,

,

,

,

Agasthya Gangavarapu

,

Ananya Gangavarapu

,

,

,

,

,

Subhra S. Goswami

,

,

,

Joseph Marvin Imperial

,

,

,

Felix Juefei-Xu

,

,

Bhavya Kailkhura

,

Hannah Rose Kirk

,

,

,

Michael Kuchnik

,

Shachi H. Kumar

,

Chris Lengerich

,

,

,

Eileen Peters Long

,

,

,

Priyanka Mary Mammen

,

Kelvin N. Manyeki

,

,

,

Shafee Mohammed

,

,

,

Dinesh Jinenhally Naganna

,

,

,

,

,

,

,

,

Forough Poursabzi-Sangdeh

,

Eleonora Presani

,

Fabrizio Puletti

,

,

,

,

,

Alice Schoenauer Sebag

,

Patrick Schramowski

,

Abolfazl Shahbazi

,

,

,

,

,

Davide Testuggine

,

Vithursan Thangarasa

,

Elizabeth Anne Watkins

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Joaquin Vanschoren

CoRR, 2024

Image2Struct: Benchmarking Structure Extraction for Vision-Language Models.

[BibT_eX]

[DOI]

Josselin Somerville Roberts

,

,

,

Michihiro Yasunaga

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VHELM: A Holistic Evaluation of Vision Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Josselin Somerville Roberts

,

Michihiro Yasunaga

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Using Benchmarking Infrastructure to Evaluate LLM Performance on CS Concept Inventories: Challenges, Opportunities, and Critiques.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2024 ACM Conference on International Computing Education Research, 2024

2023

Holistic Evaluation of Language Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Holistic Evaluation of Text-to-Image Models.

[BibT_eX]

[DOI]

,

Michihiro Yasunaga

,

,

,

,

,

,

Deepak Narayanan

,

,

Marco Bellagente

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Loading...