Jingqun Tang

Orcid: 0000-0003-2577-0119

According to our database1, Jingqun Tang authored at least 25 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning.
CoRR, May, 2025

Advancing Sequential Numerical Prediction in Autoregressive Models.
CoRR, May, 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
CoRR, May, 2025

Seed1.5-VL Technical Report.
CoRR, May, 2025

Vision as LoRA.
CoRR, March, 2025

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning.
CoRR, January, 2025

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

A Bounding Box is Worth One Token - Interleaving Layout and Text in a Large Language Model for Document Understanding.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

ParGo: Bridging Vision-Language with Partial and Global Views.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark.
CoRR, 2024

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding.
CoRR, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
CoRR, 2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
CoRR, 2024

TextSquare: Scaling up Text-Centric Visual Instruction Tuning.
CoRR, 2024

DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding.
Sci. China Inf. Sci., 2024

Harmonizing Visual Text Comprehension and Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
SPTS v2: Single-Point Scene Text Spotting.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding.
CoRR, 2023

2022
You Can even Annotate Text with Voice: Transcription-only-Supervised Text Spotting.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022


  Loading...