We stand with Ukraine

We stand with Ukraine

Ruofan Hu

Orcid: 0009-0005-1723-6778

Affiliations:

Zhejiang University, Hangzhou, China

According to our database¹, Ruofan Hu authored at least 13 papers between 2024 and 2026.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

DocRetriever: A Plug-and-Play Framework for Multimodal Document Retrieval with Comprehensive Benchmark.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2026

HIVE: A hypergraph-based game-theoretic interactive value decomposition engine for multi-lateral agents collaboration.

[DOI]

,

,

,

,

Neural Networks, 2026

2025

Generative Reasoning Recommendation via LLMs.

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, January, 2025

Multimodal Conditional Retrieval with High Controllability.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

MelRe: Vision-Based Mel-Spectrogram Restoration.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

GTA: Towards Generative Text-To-Audio Retrieval via Multi-Scale Tokenizer.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

InstructSpeech: Following Speech Editing Instructions via Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Loading...