Ge Zhang

Affiliations:
  • ByteDance Inc.
  • 01.AI (former)
  • University of Waterloo, Canada (PhD)
  • Beijing Academy of Artificial Intelligence, China (former)
  • University of Michigan, Ann Arbor, MI, USA (former)


According to our database1, Ge Zhang authored at least 128 papers between 2020 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
MSCFF-Net: multi-scale context feature fusion network for polyp segmentation.
Multim. Syst., June, 2025

OAgents: An Empirical Study of Building Effective Agents.
CoRR, June, 2025

Scaling Test-time Compute for LLM Agents.
CoRR, June, 2025

SciDA: Scientific Dynamic Assessor of LLMs.
CoRR, June, 2025

TaskCraft: Automated Generation of Agentic Tasks.
CoRR, June, 2025

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding.
CoRR, May, 2025

MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation.
CoRR, May, 2025

P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark.
CoRR, May, 2025

General-Reasoner: Advancing LLM Reasoning Across All Domains.
CoRR, May, 2025

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation.
CoRR, May, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.
CoRR, May, 2025

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models.
CoRR, May, 2025

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs.
CoRR, April, 2025

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs.
CoRR, April, 2025

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values.
CoRR, April, 2025

Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models.
CoRR, March, 2025

A Comprehensive Survey on Long Context Language Modeling.
CoRR, March, 2025

FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis.
CoRR, March, 2025

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers.
CoRR, March, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.
CoRR, March, 2025

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
CoRR, February, 2025

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models.
CoRR, February, 2025

Audio-FLAN: A Preliminary Release.
CoRR, February, 2025

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines.
CoRR, February, 2025

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models.
CoRR, February, 2025

CryptoX : Compositional Reasoning Evaluation of Large Language Models.
CoRR, February, 2025

Aligning Instruction Tuning with Pre-training.
CoRR, January, 2025

Generating Symbolic World Models via Test-time Scaling of Large Language Models.
Trans. Mach. Learn. Res., 2025

Long-context LLMs Struggle with Long In-context Learning.
Trans. Mach. Learn. Res., 2025

CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

McEval: Massively Multilingual Code Evaluation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation.
Trans. Mach. Learn. Res., 2024

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks.
Trans. Mach. Learn. Res., 2024

KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model's Reasoning Path Aggregation.
CoRR, 2024

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey.
CoRR, 2024

PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos.
CoRR, 2024

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision.
CoRR, 2024

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models.
CoRR, 2024

MdEval: Massively Multilingual Code Debugging.
CoRR, 2024

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation.
CoRR, 2024

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions.
CoRR, 2024

Can MLLMs Understand the Deep Implication Behind Chinese Images?
CoRR, 2024

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model.
CoRR, 2024

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models.
CoRR, 2024

ING-VP: MLLMs cannot Play Easy Vision-based Games Yet.
CoRR, 2024

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks.
CoRR, 2024

MIO: A Foundation Model on Multimodal Tokens.
CoRR, 2024

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models.
CoRR, 2024

OmniBench: Towards The Future of Universal Omni-Language Models.
CoRR, 2024

LIME: Less Is More for MLLM Evaluation.
CoRR, 2024

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark.
CoRR, 2024

Towards a Unified View of Preference Learning for Large Language Models: A Survey.
CoRR, 2024

FuzzCoder: Byte-level Fuzzing Test via Large Language Model.
CoRR, 2024

Foundation Models for Music: A Survey.
CoRR, 2024

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering.
CoRR, 2024

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm.
CoRR, 2024

MMRA: A Benchmark for Multi-granularity Multi-image Relational Association.
CoRR, 2024

LongIns: A Challenging Long-context Instruction-based Exam for LLMs.
CoRR, 2024

GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models.
CoRR, 2024

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents.
CoRR, 2024

McEval: Massively Multilingual Code Evaluation.
CoRR, 2024

VCR: Visual Caption Restoration.
CoRR, 2024

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models.
CoRR, 2024

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series.
CoRR, 2024

MAmmoTH2: Scaling Instructions from the Web.
CoRR, 2024

MuPT: A Generative Symbolic Music Pretrained Transformer.
CoRR, 2024

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model.
CoRR, 2024

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models.
CoRR, 2024

The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis.
CoRR, 2024

COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning.
CoRR, 2024

Yi: Open Foundation Models by 01.AI.
CoRR, 2024

DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning.
CoRR, 2024

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding.
CoRR, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.
CoRR, 2024

CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation.
CoRR, 2024

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models.
CoRR, 2024

MORE-3S: Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces.
CoRR, 2024

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation.
CoRR, 2024

Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction.
CoRR, 2024

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark.
CoRR, 2024

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models.
CoRR, 2024

Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation.
CoRR, 2024

Overview of the NLPCC 2024 Shared Task on Chinese Metaphor Generation.
Proceedings of the Natural Language Processing and Chinese Computing, 2024

MAmmoTH2: Scaling Instructions from the Web.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

DDK: Distilling Domain Knowledge for Efficient Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

RoleAgent: Building, Interacting, and Benchmarking High-quality Role-Playing Agents from Scripts.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

ComposerX: Multi-Agent Symbolic Music Composition With LLMs.
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

AutoAgents: A Framework for Automatic Agent Generation.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Massive Editing for Large Language Models via Meta Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers.
Proceedings of the Computer Vision - ECCV 2024, 2024

MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MORE-3S: Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024


SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

E2-LLM: Efficient and Extreme Length Extension of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Align on the Fly: Adapting Chatbot Behavior to Established Norms.
CoRR, 2023

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training.
CoRR, 2023

TPDM: Selectively Removing Positional Information for Zero-shot Translation via Token-Level Position Disentangle Module.
CoRR, 2023

Interactive Natural Language Processing.
CoRR, 2023

Chinese Open Instruction Generalist: A Preliminary Release.
CoRR, 2023

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation.
CoRR, 2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LyricWhiz: Robust Multilingual Zero-Shot Lyrics Transcription by Whispering to ChatGPT.
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

On the Effectiveness of Speech Self-Supervised Learning for Music.
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

2022
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning.
CoRR, 2022

HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, 2022

1Cademy @ Causal News Corpus 2022: Leveraging Self-Training in Causality Classification of Socio-Political Event Data.
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text, 2022

1Cademy @ Causal News Corpus 2022: Enhance Causal Span Detection via Beam-Search-based Position Selector.
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text, 2022

2020
Diverse Melody Generation from Chinese Lyrics via Mutual Information Maximization.
CoRR, 2020


  Loading...