Guanrou Yang

Orcid: 0009-0008-3614-1346

According to our database¹, Guanrou Yang authored at least 22 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling.

[BibT_eX]

[DOI]

CoRR, May, 2026

SemanticVocoder: Bridging Audio Generation and Audio Understanding via Semantic Latents.

[BibT_eX]

[DOI]

CoRR, February, 2026

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., January, 2026

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Evaluating the Expressive Appropriateness of Speech in Rich Contexts.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training.

[BibT_eX]

[DOI]

CoRR, May, 2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix.

[BibT_eX]

[DOI]

CoRR, May, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.

[BibT_eX]

[DOI]

CoRR, April, 2025

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction.

[BibT_eX]

[DOI]

CoRR, January, 2025

Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.

[BibT_eX]

[DOI]

CoRR, 2024

CTC-Assisted LLM-Based Contextual ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

MaLa-ASR: Multimedia-Assisted LLM-Based ASR.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Guanrou Yang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...