Manuel Sam Ribeiro

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2023

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech.

[BibT_eX]

[DOI]

Roberto Barra-Chicote

Daniel Korzekwa

Jaime Lorenzo-Trueba

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings.

[BibT_eX]

[DOI]

Cassia Valentini-Botinhao

Giulia Comini

Jaime Lorenzo-Trueba

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multilingual context-based pronunciation learning for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Cross-Speaker Style Transfer for Text-to-Speech Using Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Voice Filter: Few-Shot Text-to-Speech Speaker Adaptation Using Voice Conversion as a Post-Processing Module.

[BibT_eX]

[DOI]

Roberto Barra-Chicote

Bartek Perz

Jaime Lorenzo-Trueba

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors.

[BibT_eX]

[DOI]

Speech Commun., 2021

Automatic audiovisual synchronisation for ultrasound tongue imaging.

[BibT_eX]

[DOI]

Speech Commun., 2021

Tal: A Synchronised Multi-Speaker Corpus of Ultrasound Tongue Imaging, Audio, and Lip Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Silent versus Modal Multi-Speaker Speech Recognition from Ultrasound and Video.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2019

Ultrasound Tongue Imaging for Diarization and Alignment of Child Speech Therapy Sessions.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Synchronising Audio and Ultrasound by Learning Cross-Modal Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker-independent Classification of Phonetic Segments from Raw Ultrasound in Child Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

The CSTR entry to the 2018 Blizzard Challenge.

[BibT_eX]

[DOI]

Felipe Espic

Avashna Govender

Cassia Valentini-Botinhao

Proceedings of the Blizzard Challenge 2018, Hyderabad, India, September 8, 2018, 2018

2017

Learning Word Vector Representations Based on Acoustic Counts.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

The CSTR entry to the Blizzard Challenge 2017.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017

2016

Parallel and cascaded deep neural networks for text-to-speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The SIWIS Database: A Multilingual Speech Database with Acted Emphasis.

[BibT_eX]

[DOI]

Jean-Philippe Goldman

Pierre-Edouard Honnet

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Wavelet-based decomposition of F0 as a secondary task for DNN-based speech synthesis with multi-task learning.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis.

[BibT_eX]

[DOI]

Robert A. J. Clark

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A multi-level representation of f0 using the continuous wavelet transform and the Discrete Cosine Transform.

[BibT_eX]

[DOI]