Haoji Zhang

Orcid: 0009-0006-6132-5417

Affiliations:
  • Tsinghua University, Shenzhen International Graduate School, China


According to our database1, Haoji Zhang authored at least 15 papers between 2023 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
ChatUMM: Robust Context Tracking for Conversational Interleaved Generation.
CoRR, February, 2026

2025
DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation.
CoRR, December, 2025

VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning.
CoRR, December, 2025

Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning.
CoRR, November, 2025

Vidi2: Large Multimodal Models for Video Understanding and Creation.
CoRR, November, 2025

Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning.
CoRR, August, 2025

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning.
CoRR, May, 2025

Uni-AdaFocus: Spatial-Temporal Dynamic Computation for Video Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation.
IEEE Trans. Image Process., 2025

Flash-Vstream: Efficient Real-Time Understanding for Long Video Streams.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Ponder & Press: Advancing Visual GUI Agent towards General Computer Control.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation.
CoRR, 2024

Hierarchical Memory for Long Video QA.
CoRR, 2024

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams.
CoRR, 2024

2023
PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023


  Loading...