Bo Zhang

Orcid: 0000-0001-8052-782X

Affiliations:
  • Shanghai AI Laboratory, China
  • Fudan University, MoE Key Laboratory for Information Science of Electromagnetic Waves, Shanghai, China (PhD 2022)


According to our database1, Bo Zhang authored at least 83 papers between 2016 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback.
CoRR, August, 2025

Open-Source LLMs Collaboration Beats Closed-Source LLMs: A Scalable Multi-Agent System.
CoRR, July, 2025

SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging.
CoRR, June, 2025

BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-Task Dense Predictions.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2025

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs.
CoRR, May, 2025

Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning.
CoRR, May, 2025

NovelSeek: When Agent Becomes the Scientist - Building Closed-Loop System from Hypothesis to Verification.
CoRR, May, 2025

LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models.
CoRR, May, 2025

Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression.
CoRR, May, 2025

GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling.
CoRR, May, 2025

TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving.
CoRR, April, 2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework.
CoRR, March, 2025

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis.
CoRR, March, 2025

Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation.
CoRR, March, 2025

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency.
CoRR, February, 2025

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback.
CoRR, January, 2025

Hyperspectral Image Classification via Cascaded Spatial Cross-Attention Network.
IEEE Trans. Image Process., 2025

DF<sup>2</sup>RQ: Dynamic Feature Fusion via Region-Wise Queries for Semantic Segmentation of Multimodal Remote Sensing Data.
IEEE Trans. Geosci. Remote. Sens., 2025

A Spatial and Semantic Alignment Fusion Network for SeaLand Port Segmentation.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2025

DSF2-NAS: Dual-Stage Feature Fusion via Network Architecture Search for Classification of Multimodal Remote Sensing Images.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2025

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

SURVEYFORGE : On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Few-Shot Cross-Domain Object Detection With Instance-Level Prototype-Based Meta-Learning.
IEEE Trans. Circuits Syst. Video Technol., October, 2024

Push-and-Pull: A General Training Framework With Differential Augmentor for Domain Generalized Point Cloud Classification.
IEEE Trans. Circuits Syst. Video Technol., August, 2024

SANet: A Self-Attention Network for Agricultural Hyperspectral Image Classification.
IEEE Trans. Geosci. Remote. Sens., 2024

Multi-View Vision Fusion Network: Can 2D Pre-Trained Model Boost 3D Point Cloud Data-Scarce Learning?
IEEE Trans. Circuits Syst. Video Technol., 2024

Chimera: Improving Generalist Model with Domain-Specific Experts.
CoRR, 2024

TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception.
CoRR, 2024

HyperDet: Generalizable Detection of Synthesized Images by Generating and Merging A Mixture of Hyper LoRAs.
CoRR, 2024

MinerU: An Open-Source Solution for Precise Document Content Extraction.
CoRR, 2024

CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation.
CoRR, 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models.
CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition.
CoRR, 2024

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.
CoRR, 2024

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving.
CoRR, 2024

Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm.
CoRR, 2024

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.
Sci. China Inf. Sci., 2024

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

On Reducing the Execution Latency of Superconducting Quantum Processors via Quantum Job Scheduling.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024

Reg-TTA3D: Better Regression Makes Better Test-Time Adaptive 3D Object Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Performance-Aware Approximation of Global Channel Pruning for Multitask CNNs.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

A Closer Look at Few-Shot 3D Point Cloud Classification.
Int. J. Comput. Vis., March, 2023

Rethinking Cross-Domain Pedestrian Detection: A Background-Focused Distribution Alignment Framework for Instance-Free One-Stage Detectors.
IEEE Trans. Image Process., 2023

MATNet: A Combining Multi-Attention and Transformer Network for Hyperspectral Image Classification.
IEEE Trans. Geosci. Remote. Sens., 2023

PAN-Guided Multiresolution Fusion Network Using Swin Transformer for Pansharpening.
IEEE Geosci. Remote. Sens. Lett., 2023

Rethinking of Feature Interaction for Multi-task Learning on Dense Prediction.
CoRR, 2023

Towards Knowledge-driven Autonomous Driving.
CoRR, 2023

REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets.
CoRR, 2023

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding.
CoRR, 2023

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving.
CoRR, 2023

Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost 3D Point Cloud Data-scarce Learning?
CoRR, 2023

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Generative Diffusion Prior for Unified Image Restoration and Enhancement.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Joint Distribution Alignment via Adversarial Learning for Domain Adaptive Object Detection.
IEEE Trans. Multim., 2022

Sample-Centric Feature Generation for Semi-Supervised Few-Shot Learning.
IEEE Trans. Image Process., 2022

Curriculum-Style Local-to-Global Adaptation for Cross-Domain Remote Sensing Image Segmentation.
IEEE Trans. Geosci. Remote. Sens., 2022

Densely Semantic Enhancement for Domain Adaptive Region-Free Detectors.
IEEE Trans. Circuits Syst. Video Technol., 2022

Few-Shot Object Detection With Self-Adaptive Global Similarity and Two-Way Foreground Stimulator in Remote Sensing Images.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2022

ADAS: A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic Segmentation.
CoRR, 2022

Instance-aware Model Ensemble With Distillation For Unsupervised Domain Adaptation.
CoRR, 2022

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

2021
Coarse-to-Fine Joint Distribution Alignment for Cross-Domain Hyperspectral Image Classification.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2021

Scale-Aware Anchor-Free Object Detection via Curriculum Learning for Remote Sensing Images.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2021

Domain adaptive detection system for concealed objects using millimeter wave images.
Neural Comput. Appl., 2021

Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2017
Fast Deep Matting for Portrait Animation on Mobile Phone.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

2016
Virtual experiment teaching and research oriented to college computer curriculum.
Proceedings of the 11th International Conference on Computer Science & Education, 2016


  Loading...