Hui Wang

Orcid: 0009-0003-8057-4644

Affiliations:

Nankai University, College of Computer Science, Tianjin, China

According to our database¹, Hui Wang authored at least 29 papers between 2023 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation.

[BibT_eX]

[DOI]

CoRR, October, 2025

AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

WildElder: A Chinese Elderly Speech Dataset from the Wild with Fine-Grained Manual Annotations.

[BibT_eX]

[DOI]

CoRR, October, 2025

MECap-R1: Emotion-aware Policy with Reinforcement Learning for Multimodal Emotion Captioning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering.

[BibT_eX]

[DOI]

CoRR, September, 2025

TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

DIFFA: Large Language Diffusion Models Can Listen and Understand.

[BibT_eX]

[DOI]

CoRR, July, 2025

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling.

[BibT_eX]

[DOI]

CoRR, June, 2025

EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations.

[BibT_eX]

[DOI]

CoRR, May, 2025

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling.

[BibT_eX]

[DOI]

CoRR, May, 2025

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, April, 2025

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors.

[BibT_eX]

[DOI]

CoRR, March, 2025

CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, February, 2025

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching.

[BibT_eX]

[DOI]

CoRR, February, 2025

MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation.

[BibT_eX]

[DOI]

CoRR, January, 2025

StreamMel: Real-Time Zero-Shot Text-to-Speech Via Interleaved Continuous Autoregressive Modeling.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Chinese-LiPS: A Chinese Audio-Visual Speech Recognition Dataset with Lip-Reading and Presentation Slides.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Emotion-Preserving Prosody Anonymization Network for Voice Privacy Protection.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.

[BibT_eX]

[DOI]

CoRR, 2024

Uncertainty-Aware Mean Opinion Score Prediction.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Intermediate-Task Learning with Pretrained Model for Synthesized Speech MOS Prediction.

[BibT_eX]

[DOI]

Hui Wang

Xiguang Zheng

Yong Qin

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Hui Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...