Shuai Wang
Orcid: 0000-0003-1523-9631Affiliations:
- Chinese University of Hong Kong-Shenzhen (CUKH-SZ), Shenzhen Research Institute of Big Data, Shenzhen, China
- Shanghai Jiao Tong University, Department of Computer Science and Engineering, China (PhD 2020)
According to our database1,
Shuai Wang
authored at least 108 papers
between 2012 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
-
on github.com
On csauthors.net:
Bibliography
2025
Towards Hallucination-Free Music: A Reinforcement Learning Preference Optimization Framework for Reliable Song Generation.
CoRR, August, 2025
CoRR, July, 2025
MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions.
CoRR, July, 2025
DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization.
CoRR, July, 2025
Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis.
CoRR, July, 2025
Investigation of Zero-shot Text-to-Speech Models for Enhancing Short-Utterance Speaker Verification.
CoRR, June, 2025
CoRR, June, 2025
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction.
CoRR, June, 2025
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement.
CoRR, June, 2025
IEEE J. Sel. Top. Signal Process., May, 2025
PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs.
CoRR, May, 2025
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation.
CoRR, April, 2025
C<sup>2</sup>/AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction.
CoRR, April, 2025
CoRR, March, 2025
IEEE Signal Process. Lett., 2025
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching for Speaker Diarization.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Speech Commun., 2024
MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues.
CoRR, 2024
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings.
CoRR, 2024
CoRR, 2024
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.
CoRR, 2024
Proceedings of the Social Robotics - 16th International Conference, 2024
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
On the Effectiveness of Enrollment Speech Augmentation For Target Speaker Extraction.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
Disentangling The Prosody And Semantic Information With Pre-Trained Model For In-Context Learning Based Zero-Shot Voice Conversion.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
Combining Self-Supervised Learning and Adversarial Training Based Domain Adaptation for Speaker Verification.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
Dualvc 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation-based Voice Conversion.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Optimizing Bayesian Hmm Based X-Vector Clustering for the Second Dihard Speech Diarization Challenge.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
Frontiers Inf. Technol. Electron. Eng., 2018
Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the Intelligence Science and Big Data Engineering, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker Identification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Joint I-Vector with End-to-End System for Short Duration Text-Independent Speaker Verification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
2012
IEEE Trans. Vis. Comput. Graph., 2012