Huadai Liu

Orcid: 0009-0004-5782-5641

According to our database1, Huadai Liu authored at least 21 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models.
CoRR, August, 2025

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing.
CoRR, June, 2025

OmniAudio: Generating Spatial Audio from 360-Degree Video.
CoRR, April, 2025

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Build LLM-Based Zero-Shot Streaming TTS System with Cosyvoice.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Fast Adaptation of Pretrained Speaker Verification System for Source Speaker Tracking.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.
CoRR, 2024

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation.
CoRR, 2024

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control.
CoRR, 2024

AudioLCM: Text-to-Audio Generation with Latent Consistency Models.
CoRR, 2024

Extending Multi-modal Contrastive Representations.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AntCritic: Argument Mining for Free-Form and Visually-Rich Financial Comments.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022


  Loading...