Haoyuan Li

Orcid: 0009-0004-8926-894X

Affiliations:
  • Alibaba Group, Hangzhou, China
  • Zhejiang University (ZJU), Hangzhou, China


According to our database1, Haoyuan Li authored at least 29 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling.
CoRR, June, 2025

Fast-Slow Thinking for Large Vision-Language Model Reasoning.
CoRR, April, 2025

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation.
CoRR, March, 2025

MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation.
CoRR, March, 2025

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation.
CoRR, February, 2025

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Streaming Video Question-Answering with In-context Video KV-Cache Retrieval.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Align²LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework.
CoRR, 2024

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts.
CoRR, 2024

Align<sup>2</sup>LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation.
CoRR, 2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation.
CoRR, 2024

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition.
CoRR, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.
CoRR, 2024

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback.
CoRR, 2024

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models.
CoRR, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Language Model is a Branch Predictor for Simultaneous Machine Translation.
Proceedings of the IEEE International Conference on Acoustics, 2024

T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Weakly-Supervised Video Moment Retrieval via Regularized Two-Branch Proposal Networks with Erasing Mechanism.
CoRR, 2023

TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System.
CoRR, 2023

DATE: Domain Adaptive Product Seeker for E-Commerce.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Video-Guided Curriculum Learning for Spoken Video Grounding.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

2021
SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021


  Loading...