Berrak Sisman

CoRR, January, 2026

2025

NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion.

[BibT_eX]

[DOI]

Zongyang Du

CoRR, November, 2025

HuLA: Prosody-Aware Anti-Spoofing with Multi-Task Learning for Expressive and Emotional Synthetic Speech.

[BibT_eX]

[DOI]

Aurosweta Mahapatra

Ismail Rasim Ulgen

CoRR, September, 2025

Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens.

[BibT_eX]

[DOI]

CoRR, September, 2025

Versatile Audio-Visual Learning for Emotion Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2025

PRESENT: Zero-Shot Text-to-Prosody Control.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

Advancing Pediatric ASR: The Role of Voice Generation in Disordered Speech.

[BibT_eX]

[DOI]

Karen Rosero

Ali N. Salman

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

The Interspeech 2025 Challenge on Speech Emotion Recognition in Naturalistic Conditions.

[BibT_eX]

[DOI]

Laureano Moro-Velázquez

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Can Emotion Fool Anti-spoofing?

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

2024

Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech.

[BibT_eX]

[DOI]

CoRR, 2024

SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection.

[BibT_eX]

[DOI]

Ismail Rasim Ulgen

Junchen Lu

CoRR, 2024

We Need Variations in Speech Synthesis: Sub-center Modelling for Speaker Embeddings.

[BibT_eX]

[DOI]

CoRR, 2024

Style Mixture of Experts for Expressive Text-To-Speech Synthesis.

[BibT_eX]

[DOI]

Ahad Jawaid

Junchen Lu

CoRR, 2024

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition.

[BibT_eX]

[DOI]

IEEE Access, 2024

Accent Conversion in Text-to-Speech Using Multi-Level VAE and Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE Region 10 Conference, 2024

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE Region 10 Conference, 2024

SNIPER Training: Single-Shot Sparse Training for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Region 10 Conference, 2024

Discrete Unit Based Masking For Improving Disentanglement in Voice Conversion.

[BibT_eX]

[DOI]

Philip H. Lee

Ismail Rasim Ulgen

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Odyssey 2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results.

[BibT_eX]

[DOI]

Lucas Goncalves

Ali N. Salman

Abinay Reddy Naini

Laureano Moro-Velázquez

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with A Conditional Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Exploring speech style spaces with language models: Emotional TTS without emotion labels.

[BibT_eX]

[DOI]

Zongyang Du

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Towards Naturalistic Voice Conversion: NaturalVoices Dataset with an Automatic Processing Pipeline.

[BibT_eX]

[DOI]

Ali N. Salman

Zongyang Du

Ismail Rasim Ülgen

Carlos Busso

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Unsupervised Domain Adaptation for Speech Emotion Recognition using K-Nearest Neighbors Voice Conversion.

[BibT_eX]

[DOI]

Pravin Mote

Carlos Busso

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhanced Facial Landmarks Detection for Patients with Repaired Cleft Lip and Palate.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on Automatic Face and Gesture Recognition, 2024

2023

Speech Synthesis With Mixed Emotions.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

Emotion Intensity and its Control for Emotional Voice Conversion.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

Improving Speech Emotion Recognition Performance using Differentiable Architecture Search.

[BibT_eX]

[DOI]

CoRR, 2023

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SlothSpeech: Denial-of-service Attack Against Speech Recognition Models.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Decoding Knowledge Transfer for Neural Text-to-Speech Training.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Emotional voice conversion: Theory, databases and ESD.

[BibT_eX]

[DOI]

Speech Commun., 2022

SNIPER Training: Variable Sparsity Rate Training For Text-To-Speech.

[BibT_eX]

[DOI]

CoRR, 2022

Mixed Emotion Modelling for Emotional Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2022

Controllable Accented Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

Learning Accent Representation with Multi-Level VAE Towards Controllable Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Visualtts: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Expressive TTS Training With Frame and Style Reconstruction Loss.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

FastTalker: A neural text-to-speech architecture with shallow and group autoregression.

[BibT_eX]

[DOI]

Neural Networks, 2021

Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity.

[BibT_eX]

[DOI]

CoRR, 2021

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis.

[BibT_eX]

[DOI]

Rui Liu

CoRR, 2021

Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech.

[BibT_eX]

[DOI]

Kun Zhou

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2021

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-Stage Sequence-to-Sequence Training.

[BibT_eX]

[DOI]

Kun Zhou

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability.

[BibT_eX]

[DOI]

Rui Liu

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Graphspeech: Syntax-Aware Graph Attention Network for Neural Speech Synthesis.

[BibT_eX]

[DOI]

Rui Liu

Proceedings of the IEEE International Conference on Acoustics, 2021

SUTD-NUS System for Blizzard Challenge 2021.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

DEEPA: A Deep Neural Analyzer for Speech and Singing Vocoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2020

DeepConversion: Voice conversion with limited parallel training data.

[BibT_eX]

[DOI]

Speech Commun., 2020

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data.

[BibT_eX]

[DOI]

Kun Zhou

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Teacher-Student Training For Robust Tacotron-Based TTS.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The NUS & NWPU system for Voice Conversion Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

NUS-HLT System for Blizzard Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019

Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion.

[BibT_eX]

[DOI]

Mingyang Zhang

IEEE ACM Trans. Audio Speech Lang. Process., 2019

VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

SINGAN: Singing Voice Conversion with Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Phonetically Aware Exemplar-Based Prosody Transformation.

[BibT_eX]

[DOI]

Grandee Lee

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder.

[BibT_eX]

[DOI]

Mingyang Zhang

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

The I2R-NWPU-NUS Text-to-Speech System for Blizzard Challenge 2018.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2018, Hyderabad, India, September 8, 2018, 2018

Error Reduction Network for DBLSTM-based Voice Conversion.

[BibT_eX]

[DOI]

Mingyang Zhang

Sai Sirisha Rallabandi

Li Zhao

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

On the analysis and evaluation of prosody conversion techniques.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Sparse representation of phonetic features for voice conversion with and without parallel data.

[BibT_eX]

[DOI]

Kay Chen Tan

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Transformation of prosody in voice conversion.

[BibT_eX]

[DOI]