Wei Li

Affiliations:
  • East China Normal University, Shanghai, China
  • Shanghai AI Laboratory, China (2022 - 2025)
  • Shanghai Jiao Tong University, China (former)


According to our database1, Wei Li authored at least 38 papers between 2016 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
MMRareBench: A Rare-Disease Multimodal and Multi-Image Medical Benchmark.
CoRR, April, 2026

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale.
CoRR, April, 2026

RAR: Retrieving and Ranking Augmented MLLMs for Visual Recognition.
IEEE Trans. Image Process., 2026

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and a Comprehensive Multimodal Dataset Towards General Medical AI.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification.
CoRR, December, 2025

UniMedVL: Unifying Medical Multimodal Understanding And Generation Through Observation-Knowledge-Analysis.
CoRR, October, 2025

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing.
CoRR, September, 2025

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers.
CoRR, August, 2025

F^2TTA: Free-Form Test-Time Adaptation on Cross-Domain Medical Image Classification via Image-Level Disentangled Prompt Tuning.
CoRR, July, 2025

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation.
CoRR, May, 2025

Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model.
CoRR, May, 2025

WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages.
CoRR, January, 2025

MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group Relative Policy Optimization.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, 2025

RetinaLogos: Fine-Grained Synthesis of High-Resolution Retinal Images Through Captions.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, 2025

Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, 2025

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

OpenHuEval: Evaluating Large Language Model on Hungarian Specifics.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
CoRR, 2024

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI.
CoRR, 2024

MinerU: An Open-Source Solution for Precise Document Content Extraction.
CoRR, 2024

OpenDataLab: Empowering General Artificial Intelligence with Open Datasets.
CoRR, 2024

Investigating Public Fine-Tuning Datasets: A Complex Review of Current Practices from a Construction Perspective.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models.
CoRR, 2024

InternLM2 Technical Report.
CoRR, 2024

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset.
CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.
Sci. China Inf. Sci., 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

VIGC: Visual Instruction Generation and Correction.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023

MiChao-HuaFen 1.0: A Specialized Pre-trained Corpus Dataset for Domain-specific Large Models.
CoRR, 2023

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models.
CoRR, 2023

2016
CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016.
CoRR, 2016


  Loading...