Kun Zhou

Orcid: 0000-0002-7869-4474

Affiliations:

Alibaba DAMO Academy, Singapore
National University of Singapore, Department of Electrical and Computer Engineering, Singapore (PhD 2023)

According to our database¹, Kun Zhou authored at least 35 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, July, 2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Online Audio-Visual Autoregressive Speaker Extraction.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Thinking Fast and Slow: Robust Speech Recognition via Deep Filter-Tuning.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Conditional Latent Diffusion-Based Speech Enhancement via Dual Context Learning.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models.

[BibT_eX]

[DOI]

Xin Jing

Kun Zhou

Andreas Triantafyllopoulos

Björn W. Schuller

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Hierarchical Control of Emotion Rendering in Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions.

[BibT_eX]

[DOI]

CoRR, 2024

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with A Conditional Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

SPGM: Prioritizing Local Features for Enhanced Speech Separation Performance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Fine-Grained Quantitative Emotion Editing for Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023

Speech Synthesis With Mixed Emotions.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

Emotion Intensity and its Control for Emotional Voice Conversion.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

2022

Emotional voice conversion: Theory, databases and ESD.

[BibT_eX]

[DOI]

Speech Commun., 2022

Mixed Emotion Modelling for Emotional Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2022

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity.

[BibT_eX]

[DOI]

CoRR, 2021

Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech.

[BibT_eX]

[DOI]

Kun Zhou

Berrak Sisman

Haizhou Li

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-Stage Sequence-to-Sequence Training.

[BibT_eX]

[DOI]

Kun Zhou

Berrak Sisman

Haizhou Li

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

SUTD-NUS System for Blizzard Challenge 2021.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data.

[BibT_eX]

[DOI]

Kun Zhou

Berrak Sisman

Haizhou Li

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The NUS & NWPU system for Voice Conversion Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019

Large-Scale Speaker Diarization of Radio Broadcast Archives.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Kun Zhou

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...