Jianjian Sun

Orcid: 0000-0002-1216-9626

According to our database¹, Jianjian Sun authored at least 34 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments.

[BibT_eX]

[DOI]

CoRR, April, 2026

DM0: An Embodied-Native Vision-Language-Action Model towards Physical AI.

[BibT_eX]

[DOI]

CoRR, February, 2026

STEP3-VL-10B Technical Report.

[BibT_eX]

[DOI]

Multimodal Intelligence Team

CoRR, January, 2026

2025

Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction.

[BibT_eX]

[DOI]

CoRR, November, 2025

Dexbotic: Open-Source Vision-Language-Action Toolbox.

[BibT_eX]

[DOI]

CoRR, October, 2025

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale.

[BibT_eX]

[DOI]

CoRR, August, 2025

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning.

[BibT_eX]

[DOI]

CoRR, July, 2025

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model.

[BibT_eX]

[DOI]

CoRR, June, 2025

Unhackable Temporal Rewarding for Scalable Video MLLMs.

[BibT_eX]

[DOI]

CoRR, February, 2025

PerPO: Perceptual Preference Optimization via Discriminative Rewarding.

[BibT_eX]

[DOI]

CoRR, February, 2025

DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGCOMM 2025 Conference, 2025

Perception-R1: Pioneering Perception Policy with Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Perception in Reflection.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Unhackable Temporal Reward for Scalable Video MLLMs.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., July, 2024

Slow Perception: Let's Perceive Geometric Figures Step-by-step.

[BibT_eX]

[DOI]

CoRR, 2024

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.

[BibT_eX]

[DOI]

CoRR, 2024

DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Focus Anywhere for Fine-grained Multi-page Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Small Language Model Meets with Reinforced Vision Vocabulary.

[BibT_eX]

[DOI]

CoRR, 2024

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DreamLLM: Synergistic Multimodal Comprehension and Creation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo.

[BibT_eX]

[DOI]

CoRR, 2023

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection.

[BibT_eX]

[DOI]

CoRR, 2023

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Reversible Column Networks.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Cross Modal Transformer: Towards Fast and Robust 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo.

[BibT_eX]

[DOI]

CoRR, 2022

Jianjian Sun

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...