Chunyu Qiang

Orcid: 0009-0007-2290-3074

According to our database1, Chunyu Qiang authored at least 30 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation.
CoRR, August, 2025

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation.
CoRR, June, 2025

RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection.
CoRR, June, 2025

Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models.
CoRR, January, 2025

Emotional Style Transfer With Intensity Control in Zero-Shot TTS.
IEEE Signal Process. Lett., 2025

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Discrete Unit-based Low-latency Multi-lingual Speech Synthesis for LIMMITS'25 Challenge.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024
Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation.
CoRR, 2024

EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis.
CoRR, 2024

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation.
CoRR, 2024

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing.
CoRR, 2024

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation.
CoRR, 2024

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation.
CoRR, 2024

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-Processing.
Proceedings of the IEEE International Conference on Acoustics, 2024

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Learning Speech Representation from Contrastive Token-Acoustic Pretraining.
Proceedings of the IEEE International Conference on Acoustics, 2024

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation.
CoRR, 2022

Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

2021
Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Proceedings of the IEEE International Conference on Acoustics, 2021

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Bi-Level Speaker Supervision for One-Shot Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The NLPR Speech Synthesis entry for Blizzard Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020


  Loading...