Botian Shi

Orcid: 0000-0003-3677-7252

According to our database¹, Botian Shi authored at least 101 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2026

LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., April, 2026

SPIRAL: A Closed-Loop Framework for Self-Improving Action World Models via Reflective Planning Agents.

[BibT_eX]

[DOI]

CoRR, March, 2026

Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, February, 2026

UR-Bench: A Benchmark for Multi-Hop Reasoning over Ultra-High-Resolution Images.

[BibT_eX]

[DOI]

CoRR, January, 2026

The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios.

[BibT_eX]

[DOI]

CoRR, January, 2026

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration.

[BibT_eX]

[DOI]

CoRR, December, 2025

MemVerse: Multimodal Memory for Lifelong Learning Agents.

[BibT_eX]

[DOI]

CoRR, December, 2025

SPOT: Scalable 3D Pre-Training via Occupancy Prediction for Learning Transferable 3D Representations.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework With Group-Based Monte Carlo Tree Search.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Transp. Syst., October, 2025

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction.

[BibT_eX]

[DOI]

CoRR, October, 2025

Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle.

[BibT_eX]

[DOI]

CoRR, October, 2025

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks.

[BibT_eX]

[DOI]

CoRR, October, 2025

RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection.

[BibT_eX]

[DOI]

CoRR, September, 2025

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

[BibT_eX]

[DOI]

CoRR, September, 2025

HetaRAG: Hybrid Deep Retrieval-Augmented Generation across Heterogeneous Data Stores.

[BibT_eX]

[DOI]

CoRR, September, 2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency.

[BibT_eX]

[DOI]

CoRR, August, 2025

From Ranking to Selection: A Simple but Efficient Dynamic Passage Selector for Retrieval Augmented Generation.

[BibT_eX]

[DOI]

CoRR, August, 2025

MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs.

[BibT_eX]

[DOI]

CoRR, August, 2025

Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback.

[BibT_eX]

[DOI]

CoRR, July, 2025

DeepWriter: A Fact-Grounded Multimodal Writing Assistant Based On Offline Knowledge Base.

[BibT_eX]

[DOI]

CoRR, July, 2025

Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders.

[BibT_eX]

[DOI]

CoRR, July, 2025

THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?

[BibT_eX]

[DOI]

CoRR, June, 2025

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision.

[BibT_eX]

[DOI]

CoRR, June, 2025

O<sup>2</sup>-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering.

[BibT_eX]

[DOI]

CoRR, May, 2025

GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling.

[BibT_eX]

[DOI]

CoRR, May, 2025

TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving.

[BibT_eX]

[DOI]

CoRR, April, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

RAKG:Document-level Retrieval Augmented Knowledge Graph Construction.

[BibT_eX]

[DOI]

CoRR, April, 2025

OmniCaptioner: One Captioner to Rule Them All.

[BibT_eX]

[DOI]

CoRR, April, 2025

Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning.

[BibT_eX]

[DOI]

CoRR, March, 2025

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

LimSim Series: An Autonomous Driving Simulation Platform for Validation and Enhancement.

[BibT_eX]

[DOI]

CoRR, February, 2025

LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction Using LiDAR and Camera.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., January, 2025

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback.

[BibT_eX]

[DOI]

CoRR, January, 2025

ChartX and ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2025

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DriveArena: A Closed-Loop Generative Simulation Platform for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Chimera: Improving Generalist Model with Domain-Specific Experts.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Docopilot: Improving Multimodal Models for Document-Level Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Few-Shot Cross-Domain Object Detection With Instance-Level Prototype-Based Meta-Learning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., October, 2024

SensorX2Vehicle: Online Sensors-to-Vehicle Rotation Calibration Methods in Road Scenarios.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2024

Human-Like Decision Making at Unsignalized Intersections Using Social Value Orientation.

[BibT_eX]

[DOI]

IEEE Intell. Transp. Syst. Mag., 2024

Chimera: Improving Generalist Model with Domain-Specific Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

MinerU: An Open-Source Solution for Precise Document Content Extraction.

[BibT_eX]

[DOI]

CoRR, 2024

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes.

[BibT_eX]

[DOI]

CoRR, 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond.

[BibT_eX]

[DOI]

CoRR, 2024

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2024

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2024

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

An Extrinsic Calibration Method between LiDAR and GNSS/INS for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Zero-training LiDAR-Camera Extrinsic Calibration Method Using Segment Anything Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

VeloVox: A Low-Cost and Accurate 4D Object Detector with Single-Frame Point Cloud of Livox LiDAR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Reg-TTA3D: Better Regression Makes Better Test-Time Adaptive 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

Multi-Sensor Fusion and Cooperative Perception for Autonomous Driving: A Review.

[BibT_eX]

[DOI]

IEEE Intell. Transp. Syst. Mag., 2023

Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Knowledge-driven Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

SceneDM: Scene-level Multi-agent Trajectory Generation with Consistent Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views.

[BibT_eX]

[DOI]

CoRR, 2023

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

RangePerception: Taming LiDAR Range View for Efficient and Accurate 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LWSIS: LiDAR-Guided Weakly Supervised Instance Segmentation for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

ADAS: A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2022

Multi-modal Sensor Fusion for Auto Driving Perception: A Survey.

[BibT_eX]

[DOI]

CoRR, 2022

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

Hashing based Efficient Inference for Image-Text Matching.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020

A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos.

[BibT_eX]

[DOI]

CoRR, 2020

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2020

Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Functionality Discovery and Prediction of Physical Objects.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Microsoft Concept Graph: Mining Semantic Concepts for Short Text Understanding.

[BibT_eX]

[DOI]

Data Intell., 2019

Knowledge Aware Semantic Concept Expansion for Image-Text Matching.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Dense Procedure Captioning in Narrated Instructional Videos.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Botian Shi

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...