Anwen Hu

Orcid: 0000-0001-8839-4996

According to our database¹, Anwen Hu authored at least 31 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Think-Program-reCtify: 3D Situated Reasoning with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VG-Annotator: Vision-Language Models as Query Annotators for Unsupervised Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

mPLUG-OwI2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Multimodal Pretraining from Monolingual to Multilingual.

[BibT_eX]

[DOI]

Mach. Intell. Res., April, 2023

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.

[BibT_eX]

[DOI]

CoRR, 2023

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks.

[BibT_eX]

[DOI]

CoRR, 2023

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality.

[BibT_eX]

[DOI]

CoRR, 2023

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Explore and Tell: Embodied Visual Captioning in 3D Environments.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Movie101: A New Movie Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

MPMQA: Multimodal Question Answering on Product Manuals.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Accommodating Audio Modality in CLIP for Multimodal Processing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Generalizing Multimodal Pre-training into Multilingual via Language Acquisition.

[BibT_eX]

[DOI]

Liang Zhang

Anwen Hu

Qin Jin

CoRR, 2022

Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval.

[BibT_eX]

[DOI]

Liang Zhang

Anwen Hu

Qin Jin

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MovieUN: A Dataset for Movie Understanding and Narrating.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2021

Question-controlled Text-aware Image Captioning.

[BibT_eX]

[DOI]

Anwen Hu

Shizhe Chen

Qin Jin

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020

ICECAP: Information Concentrated Entity-aware Image Captioning.

[BibT_eX]

[DOI]

Anwen Hu

Shizhe Chen

Qin Jin

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Leveraging Multi-Token Entities in Document-Level Named Entity Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Document-Level Named Entity Recognition by Incorporating Global and Neighbor Features.

[BibT_eX]

[DOI]

Anwen Hu

Zhicheng Dou

Ji-Rong Wen

Proceedings of the Information Retrieval - 25th China Conference, 2019

Anwen Hu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...