Haohan Guo

Orcid: 0000-0002-3393-9984

According to our database¹, Haohan Guo authored at least 29 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

HeartMuLa: A Family of Open Sourced Music Foundation Models.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System.

[BibT_eX]

[DOI]

CoRR, March, 2025

Audio-FLAN: A Preliminary Release.

[BibT_eX]

[DOI]

CoRR, February, 2025

ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

PodAgent: A Comprehensive Framework for Podcast Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications.

[BibT_eX]

[DOI]

CoRR, 2024

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data.

[BibT_eX]

[DOI]

Álvaro Martín-Cortinas

Soledad López Gambino

Kayeon Yoo

Elena Sokolova

Thomas Drugman

CoRR, 2024

Addressing Index Collapse of Large-Codebook Speech Tokenizer With Dual-Decoding Product-Quantized Variational Auto-Encoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec For Efficient Language Model Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations.

[BibT_eX]

[DOI]

CoRR, 2022

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Adversarial Waveform Generation Based Singing Voice Conversion with Harmonic Signals.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Conversational End-to-End TTS for Voice Agents.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

2020

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.

[BibT_eX]

[DOI]

CoRR, 2020

Conversational End-to-End TTS for Voice Agent.

[BibT_eX]

[DOI]

CoRR, 2020

2019

Feature reinforcement with word embedding and parsing information in neural TTS.

[BibT_eX]

[DOI]

CoRR, 2019

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A New GAN-Based End-to-End TTS Training Algorithm.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Haohan Guo

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...