Qinghao Ye

Orcid: 0000-0002-7977-5540

According to our database1, Qinghao Ye authored at least 39 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training.
CoRR, 2024

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval.
CoRR, 2024

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Truncated attention-aware proposal networks with multi-scale dilation for temporal action detection.
Pattern Recognit., October, 2023

AI-based medical e-diagnosis for fast and automatic ventricular volume measurement in patients with normal pressure hydrocephalus.
Neural Comput. Appl., August, 2023

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model.
CoRR, 2023

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model.
CoRR, 2023

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.
CoRR, 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.
CoRR, 2023

Evaluation and Analysis of Hallucination in Large Vision-Language Models.
CoRR, 2023

BUS: Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization.
CoRR, 2023

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding.
CoRR, 2023

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks.
CoRR, 2023

Transforming Visual Scene Graphs to Image Captions.
CoRR, 2023

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality.
CoRR, 2023

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human.
CoRR, 2023

Learning Trajectory-Word Alignments for Video-Language Tasks.
CoRR, 2023

mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

COPA : Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video.
Proceedings of the International Conference on Machine Learning, 2023

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Trajectory-Word Alignments for Video-Language Tasks.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

BUS : Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Transforming Visual Scene Graphs to Image Captions.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
All Grains, One Scheme (AGOS): Learning Multigrain Instance Representation for Aerial Scene Classification.
IEEE Trans. Geosci. Remote. Sens., 2022

Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond.
Inf. Fusion, 2022

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training.
CoRR, 2022

All Grains, One Scheme (AGOS): Learning Multi-grain Instance Representation for Aerial Scene Classification.
CoRR, 2022

AI-based Medical e-Diagnosis for Fast and Automatic Ventricular Volume Measurement in the Patients with Normal Pressure Hydrocephalus.
CoRR, 2022

Robust weakly supervised learning for COVID-19 recognition using multi-center CT images.
Appl. Soft Comput., 2022

Exploring Global Diversity and Local Context for Video Summarization.
IEEE Access, 2022

2021
Exploring global diverse attention via pairwise temporal relation for video summarization.
Pattern Recognit., 2021

Robust Weakly Supervised Learning for COVID-19 Recognition Using Multi-Center CT Images.
CoRR, 2021

Temporal Cue Guided Video Highlight Detection with Low-Rank Audio-Visual Fusion.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Explainable AI for COVID-19 CT Classifiers: An Initial Comparison Study.
Proceedings of the 34th IEEE International Symposium on Computer-Based Medical Systems, 2021

2019
Dual attention based fine-grained leukocyte recognition for imbalanced microscopic images.
J. Intell. Fuzzy Syst., 2019

Application of Time Series Analysis to Traffic Accidents in Los Angeles.
CoRR, 2019


  Loading...