Can Huang

Orcid: 0009-0006-9126-3069

Affiliations:
  • Bytedance, Shanghai, China


According to our database1, Can Huang authored at least 23 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning.
CoRR, September, 2025

MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement.
CoRR, August, 2025

Post-Completion Learning for Language Models.
CoRR, July, 2025

Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning.
CoRR, May, 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
CoRR, May, 2025

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning.
CoRR, January, 2025

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Advancing Sequential Numerical Prediction in Autoregressive Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark.
CoRR, 2024

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding.
CoRR, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
CoRR, 2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
CoRR, 2024

TextSquare: Scaling up Text-Centric Visual Instruction Tuning.
CoRR, 2024

DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding.
Sci. China Inf. Sci., 2024

Harmonizing Visual Text Comprehension and Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
SPTS v2: Single-Point Scene Text Spotting.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences.
Knowl. Based Syst., April, 2023

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding.
CoRR, 2023

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding.
CoRR, 2023

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023


  Loading...