Xinfa Zhu
Orcid: 0000-0001-9275-523X
According to our database1,
Xinfa Zhu
authored at least 39 papers
between 2022 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
CoRR, August, 2025
DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio Synthesis.
CoRR, July, 2025
Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR.
CoRR, May, 2025
CoRR, May, 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens.
CoRR, March, 2025
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement.
CoRR, March, 2025
Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought.
CoRR, February, 2025
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis.
CoRR, February, 2025
CoRR, January, 2025
CoRR, January, 2025
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
2024
METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls.
CoRR, 2024
CoRR, 2024
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.
CoRR, 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy.
CoRR, 2024
UniStyle: Unified Style Modeling for Speaking Style Captioning and Stylistic Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-Supervised Contrastive Learning.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
CoRR, 2023
CoRR, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
2022
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis.
IEEE Signal Process. Lett., 2022
CoRR, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022