Qinghao Ye

Orcid: 0000-0002-7977-5540

According to our database¹, Qinghao Ye authored at least 43 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought.

[BibT_eX]

[DOI]

CoRR, November, 2025

Artificial Hippocampus Networks for Efficient Long-Context Modeling.

[BibT_eX]

[DOI]

CoRR, October, 2025

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LLaVA-Critic: Learning to Evaluate Multimodal Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., August, 2024

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

[BibT_eX]

[DOI]

CoRR, 2024

Classification Done Right for Vision-Language Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

mPLUG-OwI2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Truncated attention-aware proposal networks with multi-scale dilation for temporal action detection.

[BibT_eX]

[DOI]

Pattern Recognit., October, 2023

AI-based medical e-diagnosis for fast and automatic ventricular volume measurement in patients with normal pressure hydrocephalus.

[BibT_eX]

[DOI]

Neural Comput. Appl., August, 2023

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.

[BibT_eX]

[DOI]

CoRR, 2023

Evaluation and Analysis of Hallucination in Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks.

[BibT_eX]

[DOI]

CoRR, 2023

Transforming Visual Scene Graphs to Image Captions.

[BibT_eX]

[DOI]

CoRR, 2023

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality.

[BibT_eX]

[DOI]

CoRR, 2023

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human.

[BibT_eX]

[DOI]

CoRR, 2023

mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

COPA : Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Trajectory-Word Alignments for Video-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

BUS : Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Transforming Visual Scene Graphs to Image Captions.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

All Grains, One Scheme (AGOS): Learning Multigrain Instance Representation for Aerial Scene Classification.

[BibT_eX]

[DOI]

IEEE Trans. Geosci. Remote. Sens., 2022

Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond.

[BibT_eX]

[DOI]

Guang Yang

Qinghao Ye

Jun Xia

Inf. Fusion, 2022

All Grains, One Scheme (AGOS): Learning Multi-grain Instance Representation for Aerial Scene Classification.

[BibT_eX]

[DOI]

CoRR, 2022

AI-based Medical e-Diagnosis for Fast and Automatic Ventricular Volume Measurement in the Patients with Normal Pressure Hydrocephalus.

[BibT_eX]

[DOI]

CoRR, 2022

Robust weakly supervised learning for COVID-19 recognition using multi-center CT images.

[BibT_eX]

[DOI]

Appl. Soft Comput., 2022

Exploring Global Diversity and Local Context for Video Summarization.

[BibT_eX]

[DOI]

IEEE Access, 2022

2021

Exploring global diverse attention via pairwise temporal relation for video summarization.

[BibT_eX]

[DOI]

Pattern Recognit., 2021

Temporal Cue Guided Video Highlight Detection with Low-Rank Audio-Visual Fusion.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Explainable AI for COVID-19 CT Classifiers: An Initial Comparison Study.

[BibT_eX]

[DOI]

Qinghao Ye

Jun Xia

Guang Yang

Proceedings of the 34th IEEE International Symposium on Computer-Based Medical Systems, 2021

2019

Dual attention based fine-grained leukocyte recognition for imbalanced microscopic images.

[BibT_eX]

[DOI]

J. Intell. Fuzzy Syst., 2019

Application of Time Series Analysis to Traffic Accidents in Los Angeles.

[BibT_eX]

[DOI]

Qinghao Ye

Kaiyuan Hu

Yizhe Wang

CoRR, 2019

Qinghao Ye

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...