Ruibin Yuan

Orcid: 0009-0002-0539-6916

According to our database¹, Ruibin Yuan authored at least 60 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression.

[BibT_eX]

[DOI]

CoRR, April, 2026

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing.

[BibT_eX]

[DOI]

CoRR, April, 2026

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction.

[BibT_eX]

[DOI]

CoRR, March, 2026

Voices of Civilizations: A Multilingual QA Benchmark for Global Music Understanding.

[BibT_eX]

[DOI]

CoRR, March, 2026

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., January, 2026

WenetSpeech-Yue: A Large-Scale Cantonese Speech Corpus with Multi-dimensional Annotation.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

AutoMV: An Automatic Multi-Agent System for Music Video Generation.

[BibT_eX]

[DOI]

CoRR, December, 2025

Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition.

[BibT_eX]

[DOI]

CoRR, December, 2025

Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration.

[BibT_eX]

[DOI]

CoRR, October, 2025

UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice.

[BibT_eX]

[DOI]

CoRR, September, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.

[BibT_eX]

[DOI]

CoRR, May, 2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix.

[BibT_eX]

[DOI]

CoRR, May, 2025

Kimi-Audio Technical Report.

[BibT_eX]

[DOI]

CoRR, April, 2025

AudioX: Diffusion Transformer for Anything-to-Audio Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens.

[BibT_eX]

[DOI]

CoRR, March, 2025

Audio-FLAN: A Preliminary Release.

[BibT_eX]

[DOI]

CoRR, February, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

MuPT: A Generative Symbolic Music Pretrained Transformer.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

You Know What I'm Saying: Jailbreak Attack via Implicit Reference.

[BibT_eX]

[DOI]

CoRR, 2024

HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router.

[BibT_eX]

[DOI]

CoRR, 2024

OmniBench: Towards The Future of Universal Omni-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

SongTrans: An unified song transcription and alignment method for lyrics and notes.

[BibT_eX]

[DOI]

CoRR, 2024

Foundation Models for Music: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions.

[BibT_eX]

[DOI]

CoRR, 2024

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

LLMs Meet Multimodal Generation and Editing: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

MuPT: A Generative Symbolic Music Pretrained Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis.

[BibT_eX]

[DOI]

CoRR, 2024

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation.

[BibT_eX]

[DOI]

CoRR, 2024

COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Modeling Analog Dynamic Range Compressors using Deep Learning and State-space Models.

[BibT_eX]

[DOI]

Hanzhi Yin

Gang Cheng

Christian J. Steinmetz

Ruibin Yuan

Richard M. Stern

Roger B. Dannenberg

CoRR, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.

[BibT_eX]

[DOI]

CoRR, 2024

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

Can LLMs "Reason" in Music? an Evaluation of LLMs' Capability of Music Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

ComposerX: Multi-Agent Symbolic Music Composition With LLMs.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-Speech Gesture Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training.

[BibT_eX]

[DOI]

CoRR, 2023

Chinese Open Instruction Generalist: A Preliminary Release.

[BibT_eX]

[DOI]

CoRR, 2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LyricWhiz: Robust Multilingual Zero-Shot Lyrics Transcription by Whispering to ChatGPT.

[BibT_eX]

[DOI]

Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

On the Effectiveness of Speech Self-Supervised Learning for Music.

[BibT_eX]

[DOI]

Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

2022

MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Noisy Label Detection for Speaker Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Parallel Adaptive Subspace Pursuit Algorithm for Multiuser Detection of Uplink Grant-Free NOMA.

[BibT_eX]

[DOI]

Ruibin Yuan

Jianping Zheng

Proceedings of the IEEE Wireless Communications and Networking Conference, 2021

2020

Diverse Melody Generation from Chinese Lyrics via Mutual Information Maximization.

[BibT_eX]

[DOI]

CoRR, 2020

Ruibin Yuan

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...