Ming Zhang

Affiliations:
  • Fudan University, Shanghai, China


According to our database1, Ming Zhang authored at least 37 papers between 2023 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
CL-bench Life: Can Language Models Learn from Real-Life Context?
CoRR, April, 2026

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees.
CoRR, March, 2026

AI Can Learn Scientific Taste.
CoRR, March, 2026

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents.
CoRR, February, 2026

DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training.
CoRR, February, 2026

CL-bench: A Benchmark for Context Learning.
CoRR, February, 2026

Can Deep Research Agents Retrieve and Organize? Evaluating the Synthesis Gap with Expert Taxonomies.
CoRR, January, 2026

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control.
CoRR, January, 2026

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment.
CoRR, January, 2026

What is wrong with your code generated by large language models? An extensive study.
Sci. China Inf. Sci., 2026

MetaAct-RL: Training Language Models for Reasoning Through Meta-Action-Based Reinforcement Learning.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
WisPaper: Your AI Scholar Search Engine.
CoRR, December, 2025

DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training.
CoRR, December, 2025

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm.
CoRR, November, 2025

From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling.
CoRR, October, 2025

MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark.
CoRR, September, 2025

LLMEval-3: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models.
CoRR, August, 2025

VRPO: Rethinking Value Modeling for Robust RL Training under Noisy Supervision.
CoRR, August, 2025

SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents.
CoRR, August, 2025

Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction.
CoRR, June, 2025

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving.
CoRR, June, 2025

Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning.
CoRR, May, 2025

Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training.
CoRR, February, 2025

The rise and potential of large language model based agents: a survey.
Sci. China Inf. Sci., 2025

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study.
CoRR, 2024

Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning.
CoRR, 2024

MouSi: Poly-Visual-Expert Vision-Language Models.
CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
CoRR, 2024

Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning Through Trap Problems.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

LLMEval: A Preliminary Study on How to Evaluate Large Language Models.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
The Rise and Potential of Large Language Model Based Agents: A Survey.
CoRR, 2023


  Loading...