Haonan Zhang

Orcid: 0000-0003-1015-7338

Affiliations:

University of Electronic Science and Technology of China (UESTC), Future Media Center, School of Computer Science and Engineering, Chengdu, China
Sichuan Artificial Intelligence Research Institute, Yibin, China

According to our database¹, Haonan Zhang authored at least 19 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

A Survey on Efficient Vision-Language-Action Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Temporal-Guided Mixture-of-Experts for Zero-Shot Video Question Answering.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., September, 2025

ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents.

[BibT_eX]

[DOI]

CoRR, May, 2025

Visual Commonsense-Aware Representation Network for Video Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., January, 2025

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, January, 2025

Text-Video Retrieval With Global-LocalSemantic Consistent Learning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2025

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Ump: Unified Modality-Aware Prompt Tuning for Text-Video Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., November, 2024

SPT: Spatial Pyramid Transformer for Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., June, 2024

Memory-Based Augmentation Network for Video Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct.

[BibT_eX]

[DOI]

CoRR, 2024

MPT: Multi-grained Prompt Tuning for Text-Video Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Pedestrian Attributes Recognition for UAV-Human.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023

Learning visual question answering on controlled semantic noisy labels.

[BibT_eX]

[DOI]

Pattern Recognit., June, 2023

Depth-Aware Sparse Transformer for Video-Language Learning.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

2022

Video Question Answering With Prior Knowledge and Object-Sensitive Learning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

S2 Transformer for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Haonan Zhang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...