Jianjian Sun

Orcid: 0000-0002-1216-9626

According to our database1, Jianjian Sun authored at least 34 papers between 2022 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments.
CoRR, April, 2026

DM0: An Embodied-Native Vision-Language-Action Model towards Physical AI.
CoRR, February, 2026

STEP3-VL-10B Technical Report.
CoRR, January, 2026

2025
Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction.
CoRR, November, 2025

Dexbotic: Open-Source Vision-Language-Action Toolbox.
CoRR, October, 2025

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale.
CoRR, August, 2025

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning.
CoRR, July, 2025

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model.
CoRR, June, 2025

Unhackable Temporal Rewarding for Scalable Video MLLMs.
CoRR, February, 2025

PerPO: Perceptual Preference Optimization via Discriminative Rewarding.
CoRR, February, 2025

DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models.
Proceedings of the ACM SIGCOMM 2025 Conference, 2025

Perception-R1: Pioneering Perception Policy with Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Perception in Reflection.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Unhackable Temporal Reward for Scalable Video MLLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception.
IEEE Robotics Autom. Lett., July, 2024

Slow Perception: Let's Perceive Geometric Figures Step-by-step.
CoRR, 2024

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.
CoRR, 2024

DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models.
CoRR, 2024

Focus Anywhere for Fine-grained Multi-page Document Understanding.
CoRR, 2024

Small Language Model Meets with Reinforced Vision Vocabulary.
CoRR, 2024

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DreamLLM: Synergistic Multimodal Comprehension and Creation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models.
CoRR, 2023

The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge.
CoRR, 2023

BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo.
CoRR, 2023

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection.
CoRR, 2023

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Reversible Column Networks.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Cross Modal Transformer: Towards Fast and Robust 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo.
CoRR, 2022


  Loading...