Éva Székely

CoRR, March, 2026

What Counts as Real? Speech Restoration and Voice Quality Conversion Pose New Challenges to Deepfake Detection.

[BibT_eX]

[DOI]

CoRR, March, 2026

"Walk a Mile in My Voice": Voice Conversion Shapes Trust, Attribution, and Empathy in Human-AI Speech Interactions.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the 31st International Conference on Intelligent User Interfaces, 2026

From Seeing it to Experiencing it: Interactive Evaluation of Intersectional Voice Bias in Human-AI Speech Interaction.

[BibT_eX]

[DOI]

Proceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems, 2026

2025

Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs.

[BibT_eX]

[DOI]

CoRR, October, 2025

Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias.

[BibT_eX]

[DOI]

CoRR, September, 2025

Talking to...uh...um...Machines: The Impact of Disfluent Speech Agents on Partner Models and Perspective Taking.

[BibT_eX]

[DOI]

CoRR, July, 2025

When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 27th International Conference, 2025

Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices.

[BibT_eX]

[DOI]

Jura Miniota

Mísa Hejná

Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology, 2025

Voice Reconstruction through Large-Scale TTS Models: Comparing Zero-Shot and Fine-tuning Approaches to Personalise TTS in Assistive Communication.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Hear Me Out: Interactive evaluation and bias discovery platform for speech-to-speech conversational AI.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Who Gets the Mic? Investigating Gender Bias in the Speaker Assignment of a Speech-LLM.

[BibT_eX]

[DOI]

Dariia Puhach

Amir H. Payberah

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

VoiceQualityVC: A Voice Conversion System for Studying the Perceptual Effects of Voice Quality in Speech.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Voices of 'cyborg awesomeness': Posthuman embodiment of nonbinary gender expression in AI speech technologies.

[BibT_eX]

[DOI]

Maxwell Hope

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

From Static to Dynamic: Enhancing AAC with Generative Imagery and Zero-Shot TTS.

[BibT_eX]

[DOI]

Juliana Francis

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Improved Dysarthric Speech to Text Conversion via TTS Personalization.

[BibT_eX]

[DOI]

Proceedings of the 33rd European Signal Processing Conference, 2025

Impact Of Disfluent Speech Agent On Partner Models And Perspectve Taking.

[BibT_eX]

[DOI]

Proceedings of the 7th ACM Conference on Conversational User Interfaces, 2025

2024

Voice and Choice: Investigating the Role of Prosodic Variation in Request Compliance and Perceived Politeness Using Conversational TTS.

[BibT_eX]

[DOI]

Jeff Higginbotham

Francesco Possemato

Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2024

Contextual Interactive Evaluation of TTS Models in Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

An inclusive approach to creating a palette of synthetic voices for gender diversity.

[BibT_eX]

[DOI]

Maxwell Hope

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Well, what can you do with messy data? Exploring the prosody and pragmatic function of the discourse marker "well" with found data and speech synthesis.

[BibT_eX]

[DOI]

Johannah O'Mahony

Catherine Lai

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

CreakVC: a voice conversion tool for modulating creaky voice.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

ConnecTone: a modular AAC system prototype with contextual generative text prediction and style-adaptive conversational TTS.

[BibT_eX]

[DOI]

Juliana Francis

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Matcha-TTS: A Fast TTS Architecture with Conditional Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Unified Speech and Gesture Synthesis Using Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

The Role of Creaky Voice in Turn Taking and the Perception of Speaker Stance: Experiments Using Controllable TTS.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Stuck in the MOS pit: A critical analysis of MOS test methodology in TTS evaluation.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

The Impact of Pause-Internal Phonetic Particles on Recall in Synthesized Lectures.

[BibT_eX]

[DOI]

Mikey Elmers

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Can a gender-ambiguous voice reduce gender stereotypes in human-robot interactions?

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Conference on Robot and Human Interactive Communication, 2023

Hi robot, it's not what you say, it's how you say it.

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Conference on Robot and Human Interactive Communication, 2023

Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters.

[BibT_eX]

[DOI]

Jonas Beskow

Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents, 2023

So-to-Speak: An Exploratory Platform for Investigating the Interplay between Style and Prosody in TTS.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Prosody-controllable Gender-ambiguous Speech Synthesis: A Tool for Investigating Implicit Bias in Speech Perception.

[BibT_eX]

[DOI]

Ilaria Torre

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

OverFlow: Putting flows on top of neural transducers for better TTS.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Beyond Style: Synthesizing Speech with Pragmatic Functions.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Pardon my disfluency: The impact of disfluency effects on the perception of speaker competence and confidence.

[BibT_eX]

[DOI]

Ambika Kirkland

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Synthesis after a couple PINTs: Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception.

[BibT_eX]

[DOI]

Mikey Elmers

Johannah O'Mahony

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Prosody-Controllable Spontaneous TTS with Neural HMMS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Why is my Agent so Slow? Deploying Human-Like Conversational Turn-Taking.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Human-Agent Interaction, 2023

Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Automatic Face and Gesture Recognition, 2023

2022

Evaluating Sampling-based Filler Insertion with Spontaneous TTS.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Where's the uh, hesitation? The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Neural HMMS Are All You Need (For High-Quality Attention-Free TTS).

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Perception of smiling voice in spontaneous speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Personality in the mix - investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis.

[BibT_eX]

[DOI]

Jonas Beskow

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Integrated Speech and Gesture Synthesis.

[BibT_eX]

[DOI]

Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

2020

Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis.

[BibT_eX]

[DOI]

Jens Edlund

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Generating coherent spontaneous speech and gesture from text.

[BibT_eX]

[DOI]

Proceedings of the IVA '20: ACM International Conference on Intelligent Virtual Agents, 2020

Breathing and Speech Planning in Spontaneous Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Speech Synthesis Evaluation - State-of-the-Art Assessment and Suggestion for a Novel Research Program.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

How to train your fillers: uh and um in spontaneous speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Spontaneous Conversational Speech Synthesis from Found Data.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Off the Cuff: Exploring Extemporaneous Speech Delivery with TTS.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The Greennn Tree - Lengthening Position Influences Uncertainty Perception.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Casting to Corpus: Segmenting and Selecting Spontaneous Dialogue for Tts with a Cnn-lstm Speaker-dependent Breath Detector.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions.

[BibT_eX]

[DOI]

Proceedings of the Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, 2019

2017

Synthesising Uncertainty: The Interplay of Vocal Effort and Hesitation Disfluencies.

[BibT_eX]

[DOI]

Joseph Mendelson

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Using crowd-sourcing for the design of listening agents: challenges and opportunities.

[BibT_eX]

[DOI]

Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents, 2017

They Know as Much as We Do: Knowledge Estimation and Partner Modelling of Artificial Partners.

[BibT_eX]

[DOI]

Proceedings of the 39th Annual Meeting of the Cognitive Science Society, 2017

2015

The effect of soft, modal and loud voice levels on entrainment in noisy conditions.

[BibT_eX]

[DOI]

Mark T. Keane

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Predicting synthetic voice style from facial expressions. An application for augmented conversations.

[BibT_eX]

[DOI]

Speech Commun., 2014

Facial expression-based affective speech translation.

[BibT_eX]

[DOI]

Ingmar Steiner

Zeeshan Ahmed

J. Multimodal User Interfaces, 2014

2013

A system for facial expression-based affective speech translation.

[BibT_eX]

[DOI]

Zeeshan Ahmed

Ingmar Steiner

Proceedings of the 18th International Conference on Intelligent User Interfaces, 2013

2012

Synthesizing expressive speech from amateur audiobook recordings.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to expressive synthetic voices.

[BibT_eX]

[DOI]

Zeeshan Ahmed

João P. Cabral

Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies, 2012

Evaluating expressive speech synthesis from audiobook corpora for conversational phrases.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz.

[BibT_eX]

[DOI]

Stephan Schlögl

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Detecting a targeted voice style in an audiobook using voice quality features.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters.

[BibT_eX]

[DOI]

João P. Cabral

Peter Cahill

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

UCD Blizzard Challenge 2011 Entry.

[BibT_eX]

[DOI]