Hao Feng

Orcid: 0000-0001-8127-6639

Affiliations:

University of Science and Technology of China, Department of Electronic Engineering and Information Science, CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Hefei, China

According to our database¹, Hao Feng authored at least 45 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Revisiting Shadow Detection from a Vision-Language Perspective.

[BibT_eX]

[DOI]

CoRR, May, 2026

TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., April, 2026

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering.

[BibT_eX]

[DOI]

CoRR, February, 2026

Dolphin-v2: Universal Document Parsing via Scalable Anchor Prompting.

[BibT_eX]

[DOI]

CoRR, February, 2026

BookNet: Book Image Rectification via Cross-Page Attention Network.

[BibT_eX]

[DOI]

CoRR, January, 2026

MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering.

[BibT_eX]

[DOI]

CoRR, November, 2025

LaneTCA: Enhancing Video Lane Detection With Temporal Context Aggregation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., September, 2025

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning.

[BibT_eX]

[DOI]

CoRR, September, 2025

DocScanner: Robust Document Image Rectification with Progressive Learning.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., August, 2025

Post-Completion Learning for Language Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

MCP-Zero: Active Tool Discovery for Autonomous LLM Agents.

[BibT_eX]

[DOI]

Xiang Fei

Xiawu Zheng

Hao Feng

CoRR, June, 2025

Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?

[BibT_eX]

[DOI]

CoRR, May, 2025

Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

A Bounding Box is Worth One Token - Interleaving Layout and Text in a Large Language Model for Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Advancing Sequential Numerical Prediction in Autoregressive Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Recurrent Generic Contour-Based Instance Segmentation With Progressive Learning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., September, 2024

Rethinking Supervision in Document Unwarping: A Self-Consistent Flow-Free Approach.

[BibT_eX]

[DOI]

Shaokai Liu

Hao Feng

Wengang Zhou

IEEE Trans. Circuits Syst. Video Technol., June, 2024

Progressive Recurrent Network for shadow removal.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., January, 2024

Deep Unrestricted Document Image Rectification.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation.

[BibT_eX]

[DOI]

CoRR, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.

[BibT_eX]

[DOI]

CoRR, 2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.

[BibT_eX]

[DOI]

Mohamad Fitri Faiz Bin Mahmood

CoRR, 2024

TextSquare: Scaling up Text-Centric Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Progressive Multi-modal Conditional Prompt Tuning.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

2023

Model-Aware Pre-Training for Radial Distortion Rectification.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs.

[BibT_eX]

[DOI]

CoRR, 2023

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Recurrent Contour-based Instance Segmentation with Progressive Learning.

[BibT_eX]

[DOI]

CoRR, 2023

DocMAE: Document Image Rectification via Self-supervised Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Sign Language Translation with Iterative Prototype.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

PolyTracker: Progressive Contour Regression for Multiple Object Tracking and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 5th Chinese Conference, 2022

Geometric Representation Learning for Document Image Rectification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Hao Feng

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...