Haonan Zhang

Orcid: 0000-0003-1015-7338

Affiliations:
  • University of Electronic Science and Technology of China (UESTC), Future Media Center, School of Computer Science and Engineering, Chengdu, China
  • Sichuan Artificial Intelligence Research Institute, Yibin, China


According to our database1, Haonan Zhang authored at least 16 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Temporal-Guided Mixture-of-Experts for Zero-Shot Video Question Answering.
IEEE Trans. Circuits Syst. Video Technol., September, 2025

Visual Commonsense-Aware Representation Network for Video Captioning.
IEEE Trans. Neural Networks Learn. Syst., January, 2025

Text-Video Retrieval With Global-LocalSemantic Consistent Learning.
IEEE Trans. Image Process., 2025

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Ump: Unified Modality-Aware Prompt Tuning for Text-Video Retrieval.
IEEE Trans. Circuits Syst. Video Technol., November, 2024

SPT: Spatial Pyramid Transformer for Image Captioning.
IEEE Trans. Circuits Syst. Video Technol., June, 2024

Memory-Based Augmentation Network for Video Captioning.
IEEE Trans. Multim., 2024

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct.
CoRR, 2024

MPT: Multi-grained Prompt Tuning for Text-Video Retrieval.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Pedestrian Attributes Recognition for UAV-Human.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023
Learning visual question answering on controlled semantic noisy labels.
Pattern Recognit., June, 2023

Depth-Aware Sparse Transformer for Video-Language Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

2022
Video Question Answering With Prior Knowledge and Object-Sensitive Learning.
IEEE Trans. Image Process., 2022

A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

S2 Transformer for Image Captioning.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022


  Loading...