Jiabo Ye

Orcid: 0009-0009-5451-8984

According to our database¹, Jiabo Ye authored at least 32 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Mobile-Agent-v3: Fundamental Agents for GUI Automation.

[BibT_eX]

[DOI]

CoRR, August, 2025

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Qwen2.5-VL Technical Report.

[BibT_eX]

[DOI]

CoRR, February, 2025

Exploiting Presentative Feature Distributions for Parameter-Efficient Continual Learning of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., August, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception.

[BibT_eX]

[DOI]

CoRR, 2024

Part-Aware Prompt Tuning for Weakly Supervised Referring Expression Grounding.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 30th International Conference, 2024

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Sentimental Prompt Framework with Visual Text Encoder for Multimodal Sentiment Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

VG-Annotator: Vision-Language Models as Query Annotators for Unsupervised Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

mPLUG-OwI2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MNER-MI: A Multi-image Dataset for Multimodal Named Entity Recognition in Social Media.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.

[BibT_eX]

[DOI]

CoRR, 2023

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks.

[BibT_eX]

[DOI]

CoRR, 2023

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality.

[BibT_eX]

[DOI]

CoRR, 2023

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Pseudo-Query Generation For Semi-Supervised Visual Grounding With Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022

Inferring substitutable and complementary products with Knowledge-Aware Path Reasoning based on dynamic policy network.

[BibT_eX]

[DOI]

Knowl. Based Syst., 2022

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections.

[BibT_eX]

[DOI]

CoRR, 2022

CAT-MNER: Multimodal Named Entity Recognition with Knowledge-Refined Cross-Modal Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition.

[BibT_eX]

[DOI]

Proceedings of the Database Systems for Advanced Applications, 2022

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

One-Stage Visual Grounding via Semantic-Aware Feature Filter.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Jiabo Ye

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...