Roberto Barra-Chicote

Ariya Rastrow

Constantinos Papayiannis

Volker Leutnant

Trevor Wood

CoRR, May, 2025

Universal Semantic Disentangled Privacy-preserving Speech Representation Learning.

[BibT_eX]

[DOI]

Constantinos Papayiannis

Leif Rädel

Grant P. Strimel

Oluwaseyi Feyisetan

Ariya Rastrow

Volker Leutnant

Trevor Wood

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

2024

Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations.

[BibT_eX]

[DOI]

Álvaro Martín-Cortinas

Daniel Sáez-Trigueros

CoRR, 2024

Investigating Self-Supervised Features for Expressive, Multilingual Voice Conversion.

[BibT_eX]

[DOI]

Álvaro Martín-Cortinas

Daniel Sáez-Trigueros

Grzegorz Beringer

Iván Vallés-Pérez

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces.

[BibT_eX]

[DOI]

Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

2022

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech.

[BibT_eX]

[DOI]

Venkatesh Ravichandran

CoRR, 2022

Text-free non-parallel many-to-many voice conversion using normalising flows.

[BibT_eX]

[DOI]

CoRR, 2022

Remap, Warp and Attend: Non-Parallel Many-to-Many Accent Conversion with Normalizing Flows.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Prosodic alignment for off-screen automatic dubbing.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion.

[BibT_eX]

[DOI]

Magdalena Proszewska

Grzegorz Beringer

Daniel Sáez-Trigueros

Thomas Merritt

Abdelhamid Ezzerg

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Creating New Voices using Normalizing Flows.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Text-Free Non-Parallel Many-To-Many Voice Conversion Using Normalising Flow.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Voice Filter: Few-Shot Text-to-Speech Speaker Adaptation Using Voice Conversion as a Post-Processing Module.

[BibT_eX]

[DOI]

Bartek Perz

Proceedings of the IEEE International Conference on Acoustics, 2022

Duration Modeling of Neural TTS for Automatic Dubbing.

[BibT_eX]

[DOI]

Johanes Effendi

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Improving Multi-Speaker TTS Prosody Variance with a Residual Encoder and Normalizing Flows.

[BibT_eX]

[DOI]

Iván Vallés-Pérez

Julian Roth

Grzegorz Beringer

Jasha Droppo

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Intra-Sentential Speaking Rate Control in Neural Text-To-Speech for Automatic Dubbing.

[BibT_eX]

[DOI]

Mayank Sharma

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving the Expressiveness of Neural Vocoding with Non-Affine Normalizing Flows.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

SynthASR: Unlocking Synthetic Data for Speech Recognition.

[BibT_eX]

[DOI]

Amin Fazel

Wei Yang

Yulan Liu

Yixiong Meng

Roland Maas

Jasha Droppo

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Exploring the application of synthetic audio in training keyword spotters.

[BibT_eX]

[DOI]

Andrew Werchniak

Proceedings of the IEEE International Conference on Acoustics, 2021

Improvements to Prosodic Alignment for Automatic Dubbing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Machine Translation Verbosity Control for Automatic Dubbing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Parallel WaveNet conditioned on VAE latent vectors.

[BibT_eX]

[DOI]

CoRR, 2020

From Speech-to-Speech Translation to Automatic Dubbing.

[BibT_eX]

[DOI]

Ritwik Giri

Umut Isik

Arvindh Krishnaswamy

CoRR, 2020

From Speech-to-Speech Translation to Automatic Dubbing.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Spoken Language Translation, 2020

Evaluating and Optimizing Prosodic Alignment for Automatic Dubbing.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Using Vaes and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

In Other News: a Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data.

[BibT_eX]

[DOI]

Nishant Prateek

Mateusz Lajszczak

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Towards Achieving Robust Universal Neural Vocoding.

[BibT_eX]

[DOI]

Alexis Moinet

Vatsal Aggarwal

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech.

[BibT_eX]

[DOI]

Bozena Kostek

Thomas Drugman

Mateusz Lajszczak

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Robust universal neural vocoding.

[BibT_eX]

[DOI]

CoRR, 2018

Comprehensive Evaluation of Statistical Speech Waveform Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

2017

Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure Information.

[BibT_eX]

[DOI]

Rubén San-Segundo-Hernández

Thomas Merritt

Thomas Drugman

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Feature extraction from smartphone inertial signals for human activity segmentation.

[BibT_eX]

[DOI]

Signal Process., 2016

Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM.

[BibT_eX]

[DOI]

Proceedings of the COLING 2016, 2016

2015

Emotion transplantation through adaptation in HMM-based speech synthesis.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2015

Knowledge versus data in TTS: evaluation of a continuum of synthesis systems.

[BibT_eX]

[DOI]

Rosie Kay

Oliver Watts

Cassie Mayo

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Translating bus information into sign language for deaf people.

[BibT_eX]

[DOI]

Carlos González-Morcillo

Juan Carlos López

Eng. Appl. Artif. Intell., 2014

Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation.

[BibT_eX]

[DOI]

Julián D. Echeverry-Correa

Rubén San-Segundo-Hernández

Juan Manuel Montero-Martínez

Simon King

Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia, 2014

Generating segmental foreign accent.

[BibT_eX]

[DOI]

María Luisa García Lecumberri

Rubén Pérez Ramón

Martin Cooke

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Towards Cross-Lingual Emotion Transplantation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2014

2013

I <i>Feel</i> You: The Design and Evaluation of a Domotic Affect-Sensitive Spoken Conversational Agent.

[BibT_eX]

[DOI]

Syaheerah Lebai Lutfi

Sensors, 2013

LSESpeak: A spoken language generator for Deaf people.

[BibT_eX]

[DOI]

Syaheerah L. Lutfi

Expert Syst. Appl., 2013

Towards speaking style transplantation in speech synthesis.

[BibT_eX]

[DOI]

Oliver Watts

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

NEMOHIFI: an affective HiFi agent.

[BibT_eX]

[DOI]

Syaheerah Lebai Lutfi

Proceedings of the 2013 International Conference on Multimodal Interaction, 2013

2012

Speaker Diarization Features: The UPM Contribution to the RT09 Evaluation.

[BibT_eX]

[DOI]

Beatriz Martínez-González

IEEE Trans. Speech Audio Process., 2012

Selection of TDOA Parameters for MDM Speaker Diarization.

[BibT_eX]

[DOI]

Beatriz Martínez-González

Julián D. Echeverry-Correa

José A. Vallejo-Pinto

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Towards an Unsupervised Speaking Style Voice Building Framework: Multi-Style Speaker Diarization.

[BibT_eX]

[DOI]

Beatriz Martínez-González

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Towards Glottal Source Controllability in Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

Speaker Diarization Based on Intensity Channel Contribution.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2011

2010

Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech.

[BibT_eX]

[DOI]

Simon King

Speech Commun., 2010

Estudio del tipo de alineamiento en un sistema de traducción estadística de castellano a Lengua de Signos Española (LSE).

[BibT_eX]

[DOI]

Raquel Martín

Proces. del Leng. Natural, 2010

Spoken Spanish generation from sign language.

[BibT_eX]

[DOI]

D. Sánchez

Antonio García

Interact. Comput., 2010

HIFI-AV: An Audio-visual Corpus for Spoken Language Human-Machine Dialogue Research in Spanish.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Language Resources and Evaluation, 2010

2009

Speech Technology at Home: Enhanced Interfaces for People with Disabilities.

[BibT_eX]

[DOI]

Intell. Autom. Soft Comput., 2009

Novel Applications of Neural Networks in Speech Technology Systems: Search Space Reduction and Prosodic Modeling.

[BibT_eX]

[DOI]

Juana M. Gutiérrez-Arriola

José M. Pardo

Intell. Autom. Soft Comput., 2009

Speeding Up the Design of Dialogue Applications by Using Database Contents and Structure Information.

[BibT_eX]

[DOI]

Proceedings of the SIGDIAL 2009 Conference, 2009

Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions.

[BibT_eX]

[DOI]

Syaheerah L. Lutfi

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Expressive Speech Identifications based on Hidden Markov Model.

[BibT_eX]

Syaheerah L. Lutfi

Proceedings of the Second International Conference on Health Informatics, 2009

2008

Speech to sign language translation system for Spanish.

[BibT_eX]

[DOI]

Speech Commun., 2008

Desarrollo de un Robot-Guía con Integración de un Sistema de Diálogo y Expresión de Emociones: Proyecto ROBINT.

[BibT_eX]

[DOI]

Proces. del Leng. Natural, 2008

Aplicación de métodos estadísticos para la traducción de voz a Lengua de Signos.

[BibT_eX]

[DOI]

Beatriz Gallo

Proces. del Leng. Natural, 2008

Evaluation of a spoken dialogue system for controlling a Hifi audio system.

[BibT_eX]

[DOI]

Juan Blázquez

Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

2007

Automatic phonetic segmentation of Spanish emotional speech.

[BibT_eX]

[DOI]

Marc Schröder

Sacha Krstulovic

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Language identification using several sources of information with a multiple-Gaussian classifier.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

On the limitations of voice conversion techniques in emotion identification tasks.

[BibT_eX]

[DOI]

Juana M. Gutiérrez-Arriola

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

2006

A Spanish speech to sign language translation system for assisting deaf-mute people.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Prosodic and Segmental Rubrics in Emotion Identification.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

New Advances in Cross-Task and Speaker Adaptation for Air Traffic Control Tasks.

[BibT_eX]

[DOI]

Valentín Sama Rojo

Proces. del Leng. Natural, 2005

New word-level and sentence-level confidence scoring using graph theory calculus and its evaluation on speech understanding.

[BibT_eX]

[DOI]

Valentín Sama