Erica Cooper

Nicholas Sanders

Cassia Valentini-Botinhao

CoRR, March, 2026

2025

Speech Generation for Indigenous Language Education.

[BibT_eX]

[DOI]

Dan Wells

Comput. Speech Lang., 2025

Phoneme-Level Duration Controllable Neural Text-to-Speech With Phoneme Embedding Skip Connection and Modified Gaussian Duration Modeling.

[BibT_eX]

[DOI]

IEEE Access, 2025

GST-BERT-TTS: Prosody Prediction Without Accentual Labels For Multi-Speaker TTS Using BERT With Global Style Tokens.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit.

[BibT_eX]

[DOI]

Tomoki Toda

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Towards An Integrated Approach for Expressive Piano Performance Synthesis from Music Scores.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Mora-Level Prosody Prediction for Text-to-Speech Using Japanese BERT Without Accentual Labels.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

HighRateMOS: Sampling-Rate Aware Modeling for Speech Quality Assessment.

[BibT_eX]

[DOI]

Wenze Ren

Yi-Cheng Lin

Ryandhimas E. Zezario

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

The AudioMOS Challenge 2025.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Layer-wise Analysis for Quality of Multilingual Synthesized Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Progress and Challenges in DNN-Based Objective Quality Assessment of Synthesized Speech.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

2024

ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2024

MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models.

[BibT_eX]

[DOI]

Tomoki Toda

CoRR, 2024

Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

The Voicemos Challenge 2024: Beyond Speech Quality Prediction.

[BibT_eX]

[DOI]

Szu-Wei Fu

Andrea Lorena Aldana Blanco

Ryandhimas E. Zezario

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Experimental evaluation of MOS, AB and BWS listening test designs.

[BibT_eX]

[DOI]

Dan Wells

Cassia Valentini-Botinhao

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction.

[BibT_eX]

[DOI]

Aditya Ravuri

Proceedings of the IEEE International Conference on Acoustics, 2024

Synvox2: Towards A Privacy-Friendly Voxceleb2 Dataset.

[BibT_eX]

[DOI]

Jean-François Bonastre

Mickael Rouvier

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speaker Anonymization Using Orthogonal Householder Neural Network.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speaker-Text Retrieval via Contrastive Learning.

[BibT_eX]

[DOI]

CoRR, 2023

DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input.

[BibT_eX]

[DOI]

CoRR, 2023

Language-independent speaker anonymization using orthogonal Householder neural network.

[BibT_eX]

[DOI]

CoRR, 2023

Range-Based Equal Error Rate for Spoof Localization.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SASPEECH: A Hebrew Single Speaker Dataset for Text To Speech and Voice Conversion.

[BibT_eX]

[DOI]

Orian Sharoni

Roee Shenberg

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Exploring Isolated Musical Notes as Pre-training Data for Predominant Instrument Recognition in Polyphonic Music.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds.

[BibT_eX]

[DOI]

Xuan Shi

IEEE ACM Trans. Audio Speech Lang. Process., 2022

The PartialSpoof Database and Countermeasures for the Detection of Short Generated Audio Segments Embedded in a Speech Utterance.

[BibT_eX]

[DOI]

CoRR, 2022

Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

The VoiceMOS Challenge 2022.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Generalization Ability of MOS Prediction Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Use of speaker recognition approaches for learning timbre representations of musical instrument sounds from raw waveforms.

[BibT_eX]

[DOI]

Xuan Shi

CoRR, 2021

Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances.

[BibT_eX]

[DOI]

CoRR, 2021

How do Voices from Past Speech Synthesis Challenges Compare Today?

[BibT_eX]

[DOI]

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis.

[BibT_eX]

[DOI]

Xin Wang

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE.

[BibT_eX]

[DOI]

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

An Initial Investigation for Detecting Partially Spoofed Audio.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2020

Grapheme or phoneme? An Analysis of Tacotron's Embedded Representations.

[BibT_eX]

[DOI]

Antoine Perquin

CoRR, 2020

Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences.

[BibT_eX]

[DOI]

IEEE Access, 2020

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Text-to-Speech Synthesis Using Found Data for Low-Resource Languages.

[BibT_eX]

[DOI]

PhD thesis, 2019

Rakugo speech synthesis using segment-to-segment neural transduction and style tokens - toward speech synthesis for entertaining audiences.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Elshadai Tesfaye Biru

Yishak Tofik Mohammed

David Tofu

Julia Hirschberg

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

2018

A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Kai-Zhan Lee

Julia Hirschberg

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Babler - Data Collection from the Web to Support Speech Recognition and Keyword Search.

[BibT_eX]

[DOI]

Gideon Mendels