Fenglong Xie

Orcid: 0000-0002-1206-3696

According to our database1, Fenglong Xie authored at least 27 papers between 2012 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System.
CoRR, March, 2025

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration.
CoRR, January, 2025

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

PodAgent: A Comprehensive Framework for Podcast Generation.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications.
CoRR, 2024

Addressing Index Collapse of Large-Codebook Speech Tokenizer With Dual-Decoding Product-Quantized Variational Auto-Encoder.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec For Efficient Language Model Based Text-to-Speech Synthesis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

2023
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning.
CoRR, 2023

FireRedTTS: The Xiaohongshu Speech Synthesis System for Blizzard Challenge 2023.
Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023

2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations.
CoRR, 2022

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021
Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet.
CoRR, 2021

Triple M: A Practical Text-to-Speech Synthesis System with Multi-Guidance Attention and Multi-Band Multi-Time LPCNet.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A New High Quality Trajectory Tiling Based Hybrid TTS In Real Time.
Proceedings of the IEEE International Conference on Acoustics, 2021

Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS.
Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

2020
Improving End-to-End Speech Synthesis with Local Recurrent Neural Network Enhanced Transformer.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Voice conversion with SI-DNN and KL divergence based mapping without parallel training data.
Speech Commun., 2019

2018
LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis.
CoRR, 2018

Frame Selection in SI-DNN Phonetic Space with WaveNet Vocoder for Voice Conversion without Parallel Training Data.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

2016
A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A KL divergence and DNN approach to cross-lingual TTS.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2014
Pitch transformation in neural network based voice conversion.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Sequence error (SE) minimization training of neural network for voice conversion.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

TTS synthesis with bidirectional LSTM based recurrent neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2012
Cross validation and Minimum Generation Error for improved model clustering in HMM-based TTS.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012


  Loading...