Jingqun Tang

Orcid: 0000-0003-2577-0119

According to our database1, Jingqun Tang authored at least 38 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
ProFocus: Proactive Perception and Focused Reasoning in Vision-and-Language Navigation.
CoRR, March, 2026

TC-Padé: Trajectory-Consistent Padé Approximation for Diffusion Acceleration.
CoRR, March, 2026

Diffusion Probe: Generated Image Result Prediction Using CNN Probes.
CoRR, February, 2026

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering.
CoRR, February, 2026

Dolphin-v2: Universal Document Parsing via Scalable Anchor Prompting.
CoRR, February, 2026

DTP: A Simple yet Effective Distracting Token Pruning Framework for Vision-Language Action Models.
CoRR, January, 2026

SCORE: Story Coherence and Retrieval Enhancement for AI Narratives.
Proceedings of the Companion Proceedings of the ACM Web Conference 2026, 2026

MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation.
CoRR, December, 2025

Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control.
CoRR, December, 2025

Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding.
CoRR, November, 2025

ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering.
CoRR, November, 2025

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning.
CoRR, September, 2025

Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning.
CoRR, May, 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
CoRR, May, 2025

Vision as LoRA.
CoRR, March, 2025

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

MINDEV: Multi-modal Integrated Diffusion Framework for Video Reconstruction from EEG Signals.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

A Bounding Box is Worth One Token - Interleaving Layout and Text in a Large Language Model for Document Understanding.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Advancing Sequential Numerical Prediction in Autoregressive Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

ParGo: Bridging Vision-Language with Partial and Global Views.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark.
CoRR, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
CoRR, 2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
CoRR, 2024

TextSquare: Scaling up Text-Centric Visual Instruction Tuning.
CoRR, 2024

DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding.
Sci. China Inf. Sci., 2024

Harmonizing Visual Text Comprehension and Generation.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
SPTS v2: Single-Point Scene Text Spotting.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding.
CoRR, 2023

2022
You Can even Annotate Text with Voice: Transcription-only-Supervised Text Spotting.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022


  Loading...