Botian Shi

Orcid: 0000-0003-3677-7252

According to our database1, Botian Shi authored at least 82 papers between 2019 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval.
CoRR, August, 2025

From Ranking to Selection: A Simple but Efficient Dynamic Passage Selector for Retrieval Augmented Generation.
CoRR, August, 2025

MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs.
CoRR, August, 2025

Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback.
CoRR, July, 2025

DeepWriter: A Fact-Grounded Multimodal Writing Assistant Based On Offline Knowledge Base.
CoRR, July, 2025

Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders.
CoRR, July, 2025

THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?
CoRR, June, 2025

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.
CoRR, June, 2025

KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision.
CoRR, June, 2025

O<sup>2</sup>-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering.
CoRR, May, 2025

GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling.
CoRR, May, 2025

TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving.
CoRR, April, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
CoRR, April, 2025

RAKG:Document-level Retrieval Augmented Knowledge Graph Construction.
CoRR, April, 2025

OmniCaptioner: One Captioner to Rule Them All.
CoRR, April, 2025

Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning.
CoRR, March, 2025

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning.
CoRR, March, 2025

LimSim Series: An Autonomous Driving Simulation Platform for Validation and Enhancement.
CoRR, February, 2025

LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction Using LiDAR and Camera.
IEEE Robotics Autom. Lett., January, 2025

LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking.
CoRR, January, 2025

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback.
CoRR, January, 2025

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Docopilot: Improving Multimodal Models for Document-Level Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Few-Shot Cross-Domain Object Detection With Instance-Level Prototype-Based Meta-Learning.
IEEE Trans. Circuits Syst. Video Technol., October, 2024

SensorX2Vehicle: Online Sensors-to-Vehicle Rotation Calibration Methods in Road Scenarios.
IEEE Robotics Autom. Lett., 2024

Human-Like Decision Making at Unsignalized Intersections Using Social Value Orientation.
IEEE Intell. Transp. Syst. Mag., 2024

Chimera: Improving Generalist Model with Domain-Specific Experts.
CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024

MinerU: An Open-Source Solution for Precise Document Content Extraction.
CoRR, 2024

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes.
CoRR, 2024

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving.
CoRR, 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models.
CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond.
CoRR, 2024

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition.
CoRR, 2024

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.
CoRR, 2024

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving.
CoRR, 2024

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.
Sci. China Inf. Sci., 2024

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2024

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving.
Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

An Extrinsic Calibration Method between LiDAR and GNSS/INS for Autonomous Driving.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Zero-training LiDAR-Camera Extrinsic Calibration Method Using Segment Anything Model.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

VeloVox: A Low-Cost and Accurate 4D Object Detector with Single-Frame Point Cloud of Livox LiDAR.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Reg-TTA3D: Better Regression Makes Better Test-Time Adaptive 3D Object Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Multi-Sensor Fusion and Cooperative Perception for Autonomous Driving: A Review.
IEEE Intell. Transp. Syst. Mag., 2023

Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator.
CoRR, 2023

Towards Knowledge-driven Autonomous Driving.
CoRR, 2023

SceneDM: Scene-level Multi-agent Trajectory Generation with Consistent Diffusion Models.
CoRR, 2023

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving.
CoRR, 2023

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding.
CoRR, 2023

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving.
CoRR, 2023

TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework with Group-Based Monte Carlo Tree Search.
CoRR, 2023

StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views.
CoRR, 2023

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

RangePerception: Taming LiDAR Range View for Efficient and Accurate 3D Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LWSIS: LiDAR-Guided Weakly Supervised Instance Segmentation for Autonomous Driving.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
ADAS: A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic Segmentation.
CoRR, 2022

Multi-modal Sensor Fusion for Auto Driving Perception: A Survey.
CoRR, 2022

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Hashing based Efficient Inference for Image-Text Matching.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos.
CoRR, 2020

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation.
CoRR, 2020

Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Functionality Discovery and Prediction of Physical Objects.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Microsoft Concept Graph: Mining Semantic Concepts for Short Text Understanding.
Data Intell., 2019

Knowledge Aware Semantic Concept Expansion for Image-Text Matching.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Dense Procedure Captioning in Narrated Instructional Videos.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019


  Loading...