Junjie Zhou

Orcid: 0000-0001-5903-2806

Affiliations:

Beijing University of Posts and Telecommunications, State Key Laboratory of Networking and Switching Technology, Beijing, China

According to our database¹, Junjie Zhou authored at least 22 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2026

CPG: Contrastive Patch-Graph learning for 3D point cloud.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

2025

MR<sup>2</sup>-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval.

[BibT_eX]

[DOI]

CoRR, September, 2025

Task-Aware KV Compression For Cost-Effective Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, June, 2025

Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification.

[BibT_eX]

[DOI]

CoRR, June, 2025

OmniGen2: Exploration to Advanced Multimodal Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

VideoDeepResearch: Long Video Understanding With Agentic Tool Using.

[BibT_eX]

[DOI]

CoRR, June, 2025

MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos.

[BibT_eX]

[DOI]

CoRR, February, 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval.

[BibT_eX]

[DOI]

CoRR, February, 2025

OmniGen: Unified Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MLVU: Benchmarking Multi-task Long Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

FAT: Field-Aware Transformer for Point Cloud Segmentation With Adaptive Attention Fields.

[BibT_eX]

[DOI]

IEEE Trans. Ind. Informatics, September, 2024

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

OmniGen: Unified Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution.

[BibT_eX]

[DOI]

CoRR, 2023

SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

DocDiff: Document Enhancement via Residual Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Fat: Field-Aware Transformer for 3D Point Cloud Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2023

Junjie Zhou

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...