Kun Zhou

Orcid: 0000-0002-7869-4474

Affiliations:
  • Alibaba DAMO Academy, Singapore
  • National University of Singapore, Department of Electrical and Computer Engineering, Singapore (PhD 2023)


According to our database1, Kun Zhou authored at least 34 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis.
CoRR, July, 2025

Online Audio-Visual Autoregressive Speaker Extraction.
CoRR, June, 2025

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction.
CoRR, May, 2025

Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding.
CoRR, May, 2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.
CoRR, March, 2025

Conditional Latent Diffusion-Based Speech Enhancement via Dual Context Learning.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024
Hierarchical Control of Emotion Rendering in Speech Synthesis.
CoRR, 2024

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions.
CoRR, 2024

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with A Conditional Diffusion Model.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2024

SPGM: Prioritizing Local Features for Enhanced Speech Separation Performance.
Proceedings of the IEEE International Conference on Acoustics, 2024

Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

Fine-Grained Quantitative Emotion Editing for Speech Generation.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023
Speech Synthesis With Mixed Emotions.
IEEE Trans. Affect. Comput., 2023

Emotion Intensity and its Control for Emotional Voice Conversion.
IEEE Trans. Affect. Comput., 2023

2022
Emotional voice conversion: Theory, databases and ESD.
Speech Commun., 2022

Mixed Emotion Modelling for Emotional Voice Conversion.
CoRR, 2022

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021
Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity.
CoRR, 2021

Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-Stage Sequence-to-Sequence Training.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset.
Proceedings of the IEEE International Conference on Acoustics, 2021

SUTD-NUS System for Blizzard Challenge 2021.
Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The NUS & NWPU system for Voice Conversion Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Large-Scale Speaker Diarization of Radio Broadcast Archives.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019


  Loading...