Junjie Zhou

Orcid: 0000-0001-5903-2806

Affiliations:
  • Beijing University of Posts and Telecommunications, State Key Laboratory of Networking and Switching Technology, Beijing, China


According to our database1, Junjie Zhou authored at least 21 papers between 2023 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
CPG: Contrastive Patch-Graph learning for 3D point cloud.
Pattern Recognit., 2026

2025
Task-Aware KV Compression For Cost-Effective Long Video Understanding.
CoRR, June, 2025

Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification.
CoRR, June, 2025

OmniGen2: Exploration to Advanced Multimodal Generation.
CoRR, June, 2025

VideoDeepResearch: Long Video Understanding With Agentic Tool Using.
CoRR, June, 2025

MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos.
CoRR, February, 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval.
CoRR, February, 2025

OmniGen: Unified Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MLVU: Benchmarking Multi-task Long Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
FAT: Field-Aware Transformer for Point Cloud Segmentation With Adaptive Attention Fields.
IEEE Trans. Ind. Informatics, September, 2024

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding.
CoRR, 2024

OmniGen: Unified Image Generation.
CoRR, 2024

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding.
CoRR, 2024

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution.
CoRR, 2023

SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation.
CoRR, 2023

DocDiff: Document Enhancement via Residual Diffusion Models.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Fat: Field-Aware Transformer for 3D Point Cloud Semantic Segmentation.
Proceedings of the IEEE International Conference on Image Processing, 2023


  Loading...