Xiangyu Zeng

Orcid: 0000-0001-6956-5040

Affiliations:
  • Shandong University, School of Software, Jinan, China


According to our database1, Xiangyu Zeng authored at least 17 papers between 2023 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
RIVER: A Real-Time Interaction Benchmark for Video LLMs.
CoRR, March, 2026

Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning.
CoRR, January, 2026

VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations.
Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

2025
VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs.
CoRR, November, 2025

VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations.
CoRR, October, 2025

UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation.
CoRR, October, 2025

Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale.
CoRR, September, 2025

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning.
CoRR, April, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.
CoRR, January, 2025

Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method.
CoRR, January, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling.
CoRR, January, 2025

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Make Your Training Flexible: Towards Deployment-Efficient Video Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Online Video Understanding: OVBench and VideoChat-Online.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2023
Adaptive Edge-Aware Semantic Interaction Network for Salient Object Detection in Optical Remote Sensing Images.
IEEE Trans. Geosci. Remote. Sens., 2023


  Loading...