Jingqun Tang

Orcid: 0000-0003-2577-0119

According to our database¹, Jingqun Tang authored at least 38 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

ProFocus: Proactive Perception and Focused Reasoning in Vision-and-Language Navigation.

[BibT_eX]

[DOI]

CoRR, March, 2026

TC-Padé: Trajectory-Consistent Padé Approximation for Diffusion Acceleration.

[BibT_eX]

[DOI]

CoRR, March, 2026

Diffusion Probe: Generated Image Result Prediction Using CNN Probes.

[BibT_eX]

[DOI]

CoRR, February, 2026

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering.

[BibT_eX]

[DOI]

CoRR, February, 2026

Dolphin-v2: Universal Document Parsing via Scalable Anchor Prompting.

[BibT_eX]

[DOI]

CoRR, February, 2026

DTP: A Simple yet Effective Distracting Token Pruning Framework for Vision-Language Action Models.

[BibT_eX]

[DOI]

CoRR, January, 2026

SCORE: Story Coherence and Retrieval Enhancement for AI Narratives.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the ACM Web Conference 2026, 2026

MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation.

[BibT_eX]

[DOI]

CoRR, December, 2025

Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control.

[BibT_eX]

[DOI]

CoRR, December, 2025

Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding.

[BibT_eX]

[DOI]

CoRR, November, 2025

ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering.

[BibT_eX]

[DOI]

CoRR, November, 2025

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?

[BibT_eX]

[DOI]

CoRR, May, 2025

Vision as LoRA.

[BibT_eX]

[DOI]

CoRR, March, 2025

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

MINDEV: Multi-modal Integrated Diffusion Framework for Video Reconstruction from EEG Signals.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

A Bounding Box is Worth One Token - Interleaving Layout and Text in a Large Language Model for Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Advancing Sequential Numerical Prediction in Autoregressive Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

ParGo: Bridging Vision-Language with Partial and Global Views.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.

[BibT_eX]

[DOI]

CoRR, 2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.

[BibT_eX]

[DOI]

Mohamad Fitri Faiz Bin Mahmood

CoRR, 2024

TextSquare: Scaling up Text-Centric Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

Harmonizing Visual Text Comprehension and Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

SPTS v2: Single-Point Scene Text Spotting.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

2022

You Can even Annotate Text with Voice: Transcription-only-Supervised Text Spotting.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Jingqun Tang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...