Minghui Fang
Orcid: 0009-0000-6488-9695Affiliations:
- Zhejiang University, China
According to our database1,
Minghui Fang
authored at least 28 papers
between 2023 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
CoRR, July, 2025
CoRR, June, 2025
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios.
CoRR, January, 2025
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Proceedings of the 31st International Conference on Computational Linguistics, 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the Findings of the Association for Computational Linguistics, 2025
CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling.
CoRR, 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling.
CoRR, 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.
CoRR, 2024
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models.
CoRR, 2024
MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
TextrolSpeech: A Text Style Control Speech Corpus with Codec Language Text-to-Speech Models.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
2023
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models.
CoRR, 2023