Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation.

[BibT_eX]

[DOI]

Peiwen Sun

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

COMOSVC: Consistency Model-Based Singing Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

PyramidCodec: Hierarchical Codec for Long-form Music Generation in Audio Domain.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

2023

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Zhen Ye

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...