Xinhao Li

Orcid: 0009-0003-0382-3985

Affiliations:
  • Nanjing University, State Key Laboratory for Novel Software Technology, Nanjing, China
  • Shanghai AI Laboratory, OpenGVLab, Shanghai, China


According to our database1, Xinhao Li authored at least 15 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning.
CoRR, April, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.
CoRR, January, 2025

Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method.
CoRR, January, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling.
CoRR, January, 2025

Fine-grained Video-Text Retrieval: A New Benchmark and Method.
CoRR, January, 2025

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Online Video Understanding: OVBench and VideoChat-Online.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model.
CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.
CoRR, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video.
Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video.
CoRR, 2023


  Loading...