Ming Zhang

Affiliations:

Fudan University, Shanghai, China

According to our database¹, Ming Zhang authored at least 42 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening.

[BibT_eX]

[DOI]

CoRR, May, 2026

CL-bench Life: Can Language Models Learn from Real-Life Context?

[BibT_eX]

[DOI]

CoRR, April, 2026

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees.

[BibT_eX]

[DOI]

CoRR, March, 2026

AI Can Learn Scientific Taste.

[BibT_eX]

[DOI]

CoRR, March, 2026

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents.

[BibT_eX]

[DOI]

CoRR, February, 2026

DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training.

[BibT_eX]

[DOI]

CoRR, February, 2026

CL-bench: A Benchmark for Context Learning.

[BibT_eX]

[DOI]

CoRR, February, 2026

Can Deep Research Agents Retrieve and Organize? Evaluating the Synthesis Gap with Expert Taxonomies.

[BibT_eX]

[DOI]

CoRR, January, 2026

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control.

[BibT_eX]

[DOI]

CoRR, January, 2026

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment.

[BibT_eX]

[DOI]

CoRR, January, 2026

What is wrong with your code generated by large language models? An extensive study.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2026

VRPO: Rethinking Value Modeling for Robust RL under Noisy Supervision in LLM Post-Training.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

MetaAct-RL: Training Language Models for Reasoning Through Meta-Action-Based Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

WisPaper: Your AI Scholar Search Engine.

[BibT_eX]

[DOI]

CoRR, December, 2025

DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training.

[BibT_eX]

[DOI]

CoRR, December, 2025

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm.

[BibT_eX]

[DOI]

CoRR, November, 2025

From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling.

[BibT_eX]

[DOI]

CoRR, October, 2025

MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark.

[BibT_eX]

[DOI]

CoRR, September, 2025

LLMEval-3: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

VRPO: Rethinking Value Modeling for Robust RL Training under Noisy Supervision.

[BibT_eX]

[DOI]

CoRR, August, 2025

SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents.

[BibT_eX]

[DOI]

CoRR, August, 2025

Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction.

[BibT_eX]

[DOI]

CoRR, June, 2025

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving.

[BibT_eX]

[DOI]

CoRR, June, 2025

Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training.

[BibT_eX]

[DOI]

CoRR, February, 2025

The rise and potential of large language model based agents: a survey.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2025

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

MouSi: Poly-Visual-Expert Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning Through Trap Problems.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

LLMEval: A Preliminary Study on How to Evaluate Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

The Rise and Potential of Large Language Model Based Agents: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

Ming Zhang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...