Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

Yinghao Aaron Li

Rithesh Kumar

Zeyu Jin

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis.

[BibT_eX]

[DOI]

Xilin Jiang

Yinghao Aaron Li

Adrian Nicolas Florea

Cong Han

Nima Mesgarani

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Contextual feature extraction hierarchies converge in large language models and the brain.

[BibT_eX]

[DOI]

Nat. Mac. Intell., 2024

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring Self-supervised Contrastive Learning of Spatial Sound Event Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform.

[BibT_eX]

[DOI]

CoRR, 2023

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs.

[BibT_eX]

[DOI]

Yinghao Aaron Li

Cong Han

Nima Mesgarani

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes.

[BibT_eX]

[DOI]

Xilin Jiang

Yinghao Aaron Li

Nima Mesgarani

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Phoneme-Level Bert for Enhanced Prosody of Text-To-Speech with Grapheme Predictions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

2022

Styletts-VC: One-Shot Voice Conversion by Knowledge Transfer From Style-Based TTS Models.

[BibT_eX]

[DOI]

Yinghao Aaron Li

Cong Han

Nima Mesgarani

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

2021

StarGANv2-VC: A Diverse, Unsupervised, Non-Parallel Framework for Natural-Sounding Voice Conversion.

[BibT_eX]

[DOI]

Yinghao Aaron Li

Ali Zare

Nima Mesgarani

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Yinghao Aaron Li

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...