Bin Wang

Orcid: 0000-0002-5625-2966

Affiliations:

Shanghai Artificial Intelligence Laboratory, Shanghai, China

According to our database¹, Bin Wang authored at least 76 papers between 2008 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

MinerU-Popo: Universal Post-Processing Model for Structured Document Parsing.

[BibT_eX]

[DOI]

CoRR, May, 2026

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence.

[BibT_eX]

[DOI]

CoRR, May, 2026

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors.

[BibT_eX]

[DOI]

CoRR, May, 2026

MolRecBench-Wild: A Real-World Benchmark for Optical Chemical Structure Recognition.

[BibT_eX]

[DOI]

CoRR, May, 2026

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling.

[BibT_eX]

[DOI]

CoRR, April, 2026

OntoTKGE: Ontology-Enhanced Temporal Knowledge Graph Extrapolation.

[BibT_eX]

[DOI]

CoRR, April, 2026

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale.

[BibT_eX]

[DOI]

CoRR, April, 2026

MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome.

[BibT_eX]

[DOI]

CoRR, March, 2026

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding.

[BibT_eX]

[DOI]

CoRR, March, 2026

Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing.

[BibT_eX]

[DOI]

CoRR, March, 2026

Exploring the Interactive Guidance for Unified and Effective Image Matting.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2026

AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation.

[BibT_eX]

[DOI]

CoRR, February, 2026

MoDora: Tree-Based Semi-Structured Document Analysis System.

[BibT_eX]

[DOI]

CoRR, February, 2026

MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks.

[BibT_eX]

[DOI]

CoRR, February, 2026

Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision.

[BibT_eX]

[DOI]

CoRR, February, 2026

Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs.

[BibT_eX]

[DOI]

CoRR, January, 2026

DocDancer: Towards Agentic Document-Grounded Information Seeking.

[BibT_eX]

[DOI]

CoRR, January, 2026

Joint Knowledge Base Completion and Question Answering by Combining Large Language Models and Small Language Models.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM.

[BibT_eX]

[DOI]

CoRR, December, 2025

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning.

[BibT_eX]

[DOI]

CoRR, December, 2025

TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition.

[BibT_eX]

[DOI]

CoRR, December, 2025

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe.

[BibT_eX]

[DOI]

CoRR, November, 2025

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling.

[BibT_eX]

[DOI]

CoRR, November, 2025

OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation.

[BibT_eX]

[DOI]

CoRR, October, 2025

LLM/Agent-as-Data-Analyst: A Survey.

[BibT_eX]

[DOI]

CoRR, September, 2025

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing.

[BibT_eX]

[DOI]

CoRR, September, 2025

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization.

[BibT_eX]

[DOI]

CoRR, July, 2025

GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition.

[BibT_eX]

[DOI]

CoRR, June, 2025

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model.

[BibT_eX]

[DOI]

CoRR, March, 2025

MLLM-DataEngine: Closing the Loop of Multimodal Instruction Tuning Data Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Beyond Multimodal Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

et al.

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Chimera: Improving Generalist Model with Domain-Specific Experts.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

LEGION: Learning to Ground and Explain for Synthetic Image Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SURVEYFORGE : On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

DropQueries: A Simple Way to Discover Comprehensive Segment Representations.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.

[BibT_eX]

[DOI]

CoRR, 2024

Chimera: Improving Generalist Model with Domain-Specific Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction.

[BibT_eX]

[DOI]

Qintong Zhang

Victor Shea-Jay Huang

CoRR, 2024

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception.

[BibT_eX]

[DOI]

CoRR, 2024

MinerU: An Open-Source Solution for Precise Document Content Extraction.

[BibT_eX]

[DOI]

CoRR, 2024

CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation.

[BibT_eX]

[DOI]

CoRR, 2024

OpenDataLab: Empowering General Artificial Intelligence with Open Datasets.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

[BibT_eX]

[DOI]

CoRR, 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data.

[BibT_eX]

[DOI]

CoRR, 2024

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM2 Technical Report.

[BibT_eX]

[DOI]

et al.

CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.

[BibT_eX]

[DOI]

CoRR, 2024

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

Distribution-Aware Data Expansion with Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Parrot Captions Teach CLIP to Spot Text.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VIGC: Visual Instruction Generation and Correction.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.

[BibT_eX]

[DOI]

CoRR, 2023

MLLM-DataEngine: An Iterative Refinement Approach for MLLM.

[BibT_eX]

[DOI]

CoRR, 2023

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models.

[BibT_eX]

[DOI]

CoRR, 2023

V3Det: Vast Vocabulary Visual Detection Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

Cycle-Consistent Learning for Weakly Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the HCMA@MM 2022: Proceedings of the 3rd International Workshop on Human-Centric Multimedia Analysis, 2022

2019

Detection and tracking based tubelet generation for video object detection.

[BibT_eX]

[DOI]

J. Vis. Commun. Image Represent., 2019

Spatiotemporal Breast Mass Detection Network (MD-Net) in 4D DCE-MRI Images.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, 2019

Boundary Perception Guidance: A Scribble-Supervised Semantic Segmentation Approach.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

2018

Automated Pulmonary Nodule Detection: High Sensitivity with Few Candidates.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2018, 2018

2012

Staying-alive path planning with energy optimization for mobile robots.

[BibT_eX]

[DOI]

Expert Syst. Appl., 2012

2008

Staying-alive and energy-efficient path planning for mobile robots.

[BibT_eX]

[DOI]

Proceedings of the American Control Conference, 2008

A new feedrate adaptation control NURBS interpolation based on de boor algorithm in CNC systems.

[BibT_eX]

[DOI]

Proceedings of the American Control Conference, 2008

Bin Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...