Zhen Ye

Orcid: 0009-0003-6932-9859

Affiliations:
  • Hong Kong University of Science and Technology, Hong Kong, SAR, China


According to our database1, Zhen Ye authored at least 20 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets.
CoRR, September, 2025

Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis.
CoRR, August, 2025

Inference-time Scaling for Diffusion-based Audio Super-resolution.
CoRR, August, 2025

J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge.
CoRR, May, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.
CoRR, March, 2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens.
CoRR, March, 2025

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement.
CoRR, March, 2025

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis.
CoRR, February, 2025

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models.
CoRR, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

COMOSVC: Consistency Model-Based Singing Voice Conversion.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

PyramidCodec: Hierarchical Codec for Long-form Music Generation in Audio Domain.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023


  Loading...