HierSpeech++: Bridging the Gap Between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

Sang-Hoon Lee

Ha-Yeong Choi

Seung-Bin Kim

Seong-Whan Lee

IEEE Trans. Neural Networks Learn. Syst., October, 2025

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech Via Emotion-Adaptive Spherical Vector.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2025

Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching.

[BibT_eX]

[DOI]

Jun-Hak Yun

Seung-Bin Kim

Seong-Whan Lee

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

FillerSpeech: Towards Human-Like Text-to-Speech Synthesis with Filler Insertion and Filler Style Control.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

Audio Super-Resolution With Robust Speech Representation Learning of Masked Autoencoder.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

PromotiCon: Prompt-based Emotion Controllable Text-to-Speech via Prompt Generation and Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2024

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

TranSentence: speech-to-speech Translation via Language-Agnostic Sentence-Level Speech Encoding without Language-Parallel Data.

[BibT_eX]

[DOI]

Seung-Bin Kim

Sang-Hoon Lee

Seong-Whan Lee

Proceedings of the IEEE International Conference on Acoustics, 2024

2022

HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

EMOQ-TTS: Emotion Intensity Quantization for Fine-Grained Controllable Emotional Text-to-Speech.

[BibT_eX]

[DOI]