Haoyuan Li

Orcid: 0009-0004-8926-894X

Affiliations:

Alibaba Group, Hangzhou, China
Zhejiang University (ZJU), Hangzhou, China

According to our database¹, Haoyuan Li authored at least 29 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling.

[BibT_eX]

[DOI]

CoRR, June, 2025

Fast-Slow Thinking for Large Vision-Language Model Reasoning.

[BibT_eX]

[DOI]

CoRR, April, 2025

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation.

[BibT_eX]

[DOI]

CoRR, March, 2025

MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Streaming Video Question-Answering with In-context Video KV-Cache Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Align²LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework.

[BibT_eX]

[DOI]

CoRR, 2024

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts.

[BibT_eX]

[DOI]

CoRR, 2024

Align<sup>2</sup>LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition.

[BibT_eX]

[DOI]

CoRR, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.

[BibT_eX]

[DOI]

CoRR, 2024

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback.

[BibT_eX]

[DOI]

CoRR, 2024

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Language Model is a Branch Predictor for Simultaneous Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Weakly-Supervised Video Moment Retrieval via Regularized Two-Branch Proposal Networks with Erasing Mechanism.

[BibT_eX]

[DOI]

CoRR, 2023

TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System.

[BibT_eX]

[DOI]

CoRR, 2023

DATE: Domain Adaptive Product Seeker for E-Commerce.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Video-Guided Curriculum Learning for Spoken Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

2021

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Haoyuan Li

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...