Kaizhi Qian

James R. Glass

Chang D. Yoo

Speech Commun., 2026

2025

ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models.

[BibT_eX]

[DOI]

Chuang Gan

CoRR, July, 2025

A Hierarchical Probabilistic Framework for Incremental Knowledge Tracing in Classroom Settings.

[BibT_eX]

[DOI]

CoRR, June, 2025

RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

UniMuMo: Unified Text, Music, and Motion Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Towards Unsupervised Speech Recognition Without Pronunciation Models.

[BibT_eX]

[DOI]

Chang D. Yoo

CoRR, 2024

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text.

[BibT_eX]

[DOI]

CoRR, 2024

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Speech Self-Supervised Learning Using Diffusion Model Synthetic Data.

[BibT_eX]

[DOI]

Mark A. Hasegawa-Johnson

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Domain Generalization for Language-Independent Automatic Speech Recognition.

[BibT_eX]

[DOI]

Frontiers Artif. Intell., 2022

Improving Self-Supervised Speech Representations by Disentangling Speakers.

[BibT_eX]

[DOI]

CoRR, 2022

SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks.

[BibT_eX]

[DOI]

Chak Ho Chan

CoRR, 2022

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

WavPrompt: Towards Few-Shot Spoken Language Understanding with Frozen Language Models.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

SpeechSplit2.0: Unsupervised Speech Disentanglement for Voice Conversion without Tuning Autoencoder Bottlenecks.

[BibT_eX]

[DOI]

Chak Ho Chan

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Global Rhythm Style Transfer Without Text Transcriptions.

[BibT_eX]

[DOI]

CoRR, 2021

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Speech Denoising with Auditory Models.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Global Prosody Style Transfer Without Text Transcriptions.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Continuous Cnn For Nonuniform Time Series.

[BibT_eX]

[DOI]

Jishen Zhao

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Deep generative models for speech editing

[BibT_eX]

[DOI]

PhD thesis, 2020

Deep Network Perceptual Losses for Speech Denoising.

[BibT_eX]

[DOI]

CoRR, 2020

Unsupervised Speech Decomposition via Triple Information Bottleneck.

[BibT_eX]

[DOI]

David D. Cox

Proceedings of the 37th International Conference on Machine Learning, 2020

F0-Consistent Many-To-Many Non-Parallel Voice Conversion Via Conditional Autoencoder.

[BibT_eX]

[DOI]

Zeyu Jin

Gautham J. Mysore

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

An Efficient and Margin-Approaching Zero-Confidence Adversarial Attack.

[BibT_eX]

[DOI]

CoRR, 2019

Zero-Shot Voice Style Transfer with Only Autoencoder Loss.

[BibT_eX]

[DOI]

CoRR, 2019

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking.

[BibT_eX]

[DOI]

Feng Li

Masato Akagi

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Deep Learning Based Speech Beamforming.

[BibT_eX]

[DOI]

Dinei A. F. Florêncio

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Speech Enhancement Using Bayesian Wavenet.

[BibT_eX]

[DOI]