Zheng Ge

Orcid: 0000-0002-8630-8270

According to our database¹, Zheng Ge authored at least 65 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, August, 2025

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale.

[BibT_eX]

[DOI]

CoRR, August, 2025

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning.

[BibT_eX]

[DOI]

CoRR, July, 2025

DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning.

[BibT_eX]

[DOI]

CoRR, June, 2025

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model.

[BibT_eX]

[DOI]

CoRR, June, 2025

Step1X-Edit: A Practical Framework for General Image Editing.

[BibT_eX]

[DOI]

CoRR, April, 2025

Perception-R1: Pioneering Perception Policy with Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, April, 2025

M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?

[BibT_eX]

[DOI]

CoRR, March, 2025

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model.

[BibT_eX]

[DOI]

CoRR, March, 2025

Unhackable Temporal Rewarding for Scalable Video MLLMs.

[BibT_eX]

[DOI]

CoRR, February, 2025

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model.

[BibT_eX]

[DOI]

CoRR, February, 2025

PerPO: Perceptual Preference Optimization via Discriminative Rewarding.

[BibT_eX]

[DOI]

CoRR, February, 2025

Taming Teacher Forcing for Masked Autoregressive Video Generation.

[BibT_eX]

[DOI]

CoRR, January, 2025

DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGCOMM 2025 Conference, 2025

Perception in Reflection.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Unhackable Temporal Reward for Scalable Video MLLMs.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Reconstructive Visual Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Dependency and Link Diversity Placement for Reliable Wireless Diamond Networks.

[BibT_eX]

[DOI]

Zheng Ge

Eduard A. Jorswieck

Proceedings of the IEEE International Conference on Communications, 2025

Taming Teacher Forcing for Masked Autoregressive Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

GroupLane: End-to-End 3D Lane Detection With Channel-Wise Grouping.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., November, 2024

Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., July, 2024

Fourier-Transform-Based Unmixing Method for Fusion of Multiresolution Satellite Images.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2024

Slow Perception: Let's Perceive Geometric Figures Step-by-step.

[BibT_eX]

[DOI]

CoRR, 2024

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.

[BibT_eX]

[DOI]

CoRR, 2024

DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Focus Anywhere for Fine-grained Multi-page Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction.

[BibT_eX]

[DOI]

CoRR, 2024

Small Language Model Meets with Reinforced Vision Vocabulary.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Supervised Visual Preference Alignment.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DreamLLM: Synergistic Multimodal Comprehension and Creation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Merlin: Empowering Multimodal LLMs with Foresight Minds.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss.

[BibT_eX]

[DOI]

Proceedings of the 35th British Machine Vision Conference, 2024

2023

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection.

[BibT_eX]

[DOI]

CoRR, 2023

The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

Align-DETR: Improving DETR with Simple IoU-aware BCE loss.

[BibT_eX]

[DOI]

CoRR, 2023

BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo.

[BibT_eX]

[DOI]

CoRR, 2023

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Towards 3D Object Detection with 2D Supervision.

[BibT_eX]

[DOI]

CoRR, 2022

Towards A Robust Deepfake Detector: Common Artifact Deepfake Detection Model.

[BibT_eX]

[DOI]

CoRR, 2022

BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo.

[BibT_eX]

[DOI]

CoRR, 2022

STS: Surround-view Temporal Stereo for Multi-view 3D Detection.

[BibT_eX]

[DOI]

CoRR, 2022

PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View.

[BibT_eX]

[DOI]

CoRR, 2022

Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

LLA: Loss-aware label assignment for dense pedestrian detection.

[BibT_eX]

[DOI]

Neurocomputing, 2021

Delving deep into the imbalance of positive proposals in two-stage object detection.

[BibT_eX]

[DOI]

Neurocomputing, 2021

Workshop on Autonomous Driving at CVPR 2021: Technical Report for Streaming Perception Challenge.

[BibT_eX]

[DOI]

CoRR, 2021

YOLOX: Exceeding YOLO Series in 2021.

[BibT_eX]

[DOI]

CoRR, 2021

Premium Power Value-Added Service Product Decision-Making Method Based on Multi-Index Two-Sided Matching.

[BibT_eX]

[DOI]

IEEE Access, 2021

OTA: Optimal Transport Assignment for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Delving into the Imbalance of Positive Proposals in Two-stage Object Detection.

[BibT_eX]

[DOI]

CoRR, 2020

DualBox: Generating BBox Pair with Strong Correspondence via Occlusion Pattern Clustering and Proposal Refinement.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

PS-RCNN: Detecting Secondary Human Instances in a Crowd via Primary Object Suppression.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2020

NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2018

Fast Portrait Matting Using Spatial Detail-Preserving Network.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 25th International Conference, 2018

Zheng Ge

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...