Haohan Guo
Orcid: 0000-0002-3393-9984
According to our database1,
Haohan Guo
authored at least 28 papers
between 2019 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling.
CoRR, April, 2025
CoRR, March, 2025
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications.
CoRR, 2024
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.
CoRR, 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data.
CoRR, 2024
Addressing Index Collapse of Large-Codebook Speech Tokenizer With Dual-Decoding Product-Quantized Variational Auto-Encoder.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
SoCodec: A Semantic-Ordered Multi-Stream Speech Codec For Efficient Language Model Based Text-to-Speech Synthesis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning.
CoRR, 2023
2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations.
CoRR, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Improving Adversarial Waveform Generation Based Singing Voice Conversion with Harmonic Signals.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
2020
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.
CoRR, 2020
2019
CoRR, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019