Hui Wang

Affiliations:
  • Nankai University, College of Computer Science, Tianjin, China


According to our database1, Hui Wang authored at least 22 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
DIFFA: Large Language Diffusion Models Can Listen and Understand.
CoRR, July, 2025

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling.
CoRR, June, 2025

EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations.
CoRR, May, 2025

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling.
CoRR, May, 2025

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval.
CoRR, May, 2025

Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides.
CoRR, April, 2025

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis.
CoRR, April, 2025

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors.
CoRR, March, 2025

CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition.
CoRR, February, 2025

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching.
CoRR, February, 2025

MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation.
CoRR, January, 2025

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Emotion-Preserving Prosody Anonymization Network for Voice Privacy Protection.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.
CoRR, 2024

Uncertainty-Aware Mean Opinion Score Prediction.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Intermediate-Task Learning with Pretrained Model for Synthesized Speech MOS Prediction.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023


  Loading...