Can Huang

Orcid: 0009-0006-9126-3069

Affiliations:

Bytedance, Shanghai, China

According to our database¹, Can Huang authored at least 23 papers between 2023 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning.

[BibT_eX]

[DOI]

CoRR, September, 2025

MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement.

[BibT_eX]

[DOI]

CoRR, August, 2025

Post-Completion Learning for Language Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?

[BibT_eX]

[DOI]

CoRR, May, 2025

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning.

[BibT_eX]

[DOI]

CoRR, January, 2025

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Advancing Sequential Numerical Prediction in Autoregressive Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.

[BibT_eX]

[DOI]

CoRR, 2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.

[BibT_eX]

[DOI]

Mohamad Fitri Faiz Bin Mahmood

CoRR, 2024

TextSquare: Scaling up Text-Centric Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

Harmonizing Visual Text Comprehension and Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

SPTS v2: Single-Point Scene Text Spotting.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences.

[BibT_eX]

[DOI]

Knowl. Based Syst., April, 2023

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Can Huang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...