Dongchao Yang

Orcid: 0000-0002-8905-224X

According to our database¹, Dongchao Yang authored at least 75 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment.

[BibT_eX]

[DOI]

CoRR, May, 2026

UniAudio 2.0: A Unified Audio Language Model with Text-Aligned Factorized Audio Tokenization.

[BibT_eX]

[DOI]

CoRR, February, 2026

HeartMuLa: A Family of Open Sourced Music Foundation Models.

[BibT_eX]

[DOI]

CoRR, January, 2026

UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, December, 2025

SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization.

[BibT_eX]

[DOI]

CoRR, September, 2025

MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark.

[BibT_eX]

[DOI]

CoRR, June, 2025

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech.

[BibT_eX]

[DOI]

Laureano Moro-Velázquez

CoRR, June, 2025

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline.

[BibT_eX]

[DOI]

Laureano Moro-Velázquez

Jesús Villalba

Najim Dehak

CoRR, May, 2025

Kimi-Audio Technical Report.

[BibT_eX]

[DOI]

CoRR, April, 2025

VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, April, 2025

MoonCast: High-Quality Zero-Shot Podcast Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Audio-FLAN: A Preliminary Release.

[BibT_eX]

[DOI]

CoRR, February, 2025

SpeechSEC: A Unified Multi-Task Framework for Speech Synthesis, Editing, and Continuation.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

UniSep: Universal Target Audio Separation with Language Models at Scale.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions.

[BibT_eX]

[DOI]

CoRR, 2024

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Codec-Superb @ SLT 2024: A Lightweight Benchmark For Neural Audio Codec Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Addressing Index Collapse of Large-Codebook Speech Tokenizer With Dual-Decoding Product-Quantized Variational Auto-Encoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec For Efficient Language Model Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructSpeech: Following Speech Editing Instructions via Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

PromptTTS 2: Describing and Generating Voices with Text Prompt.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Consistent and Relevant: Rethink the Query Embedding in General Sound Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Diffsound: Discrete Diffusion Model for Text-to-Sound Generation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

PromptTTS 2: Describing and Generating Voices with Text Prompt.

[BibT_eX]

[DOI]

CoRR, 2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation.

[BibT_eX]

[DOI]

CoRR, 2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec.

[BibT_eX]

[DOI]

CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

CoRR, 2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.

[BibT_eX]

[DOI]

CoRR, 2023

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Background-aware Modeling for Weakly Supervised Sound Event Detection.

[BibT_eX]

[DOI]

Yifei Xin

Dongchao Yang

Yuexian Zou

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Improving Text-Audio Retrieval by Text-Aware Attention Pooling and Prior Matrix Revised Loss.

[BibT_eX]

[DOI]

Yifei Xin

Dongchao Yang

Yuexian Zou

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Weakly Supervised Sound Event Detection with Causal Intervention.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

A Two-student Learning Framework for Mixed Supervised Target Sound Detection.

[BibT_eX]

[DOI]

CoRR, 2022

A Mobile Robot Design for Efficient and Large-Scale Solar Panel Cleaning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2022

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification.

[BibT_eX]

[DOI]

Yifei Xin

Dongchao Yang

Yuexian Zou

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Target Sound Extraction with Timestamp Information.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Mutual Learning Framework for Few-Shot Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

A Mixed Supervised Learning Framework For Target Sound Detection.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

Detect What You Want: Target Sound Detection.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

Omnidirectional Motion Control Method of Quadruped Robot Based on 3D-CPG Oscillator Group.

[BibT_eX]

[DOI]

Proceedings of the Robotics in Natural Settings, 2022

2021

Detect what you want: Target Sound Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification.

[BibT_eX]

[DOI]

Dongchao Yang

Helin Wang

Yuexian Zou

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

YOLOv3 with Asymmetric Intersection over Union Based Loss Function for Human Detection.

[BibT_eX]

[DOI]

Proceedings of the ICMLSC '21: 2021 The 5th International Conference on Machine Learning and Soft Computing, 2021

Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

2020

Towards Data Distillation for End-to-end Spoken Conversational Question Answering.

[BibT_eX]

[DOI]

CoRR, 2020

A petal-array capacitive tactile sensor with micro-pin for robotic fingertip sensing.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE International Conference on Soft Robotics, 2020

Dongchao Yang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...