Haoyuan Li
Orcid: 0009-0004-8926-894XAffiliations:
- Alibaba Group, Hangzhou, China
- Zhejiang University (ZJU), Hangzhou, China
According to our database1,
Haoyuan Li
authored at least 29 papers
between 2021 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling.
CoRR, June, 2025
CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation.
CoRR, March, 2025
MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation.
CoRR, March, 2025
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation.
CoRR, February, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Align²LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework.
CoRR, 2024
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts.
CoRR, 2024
Align<sup>2</sup>LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback.
CoRR, 2024
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models.
CoRR, 2024
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
Weakly-Supervised Video Moment Retrieval via Regularized Two-Branch Proposal Networks with Erasing Mechanism.
CoRR, 2023
TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System.
CoRR, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021