Qingpei Guo
Orcid: 0009-0001-0521-9664
According to our database1,
Qingpei Guo authored at least 50 papers
between 2015 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2026
The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning.
CoRR, March, 2026
Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models.
CoRR, February, 2026
CoRR, February, 2026
VaccineRAG: Boosting Multimodal Large Language Models' Immunity to Harmful RAG Samples.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
2025
VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs.
CoRR, December, 2025
OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs.
CoRR, November, 2025
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert.
CoRR, November, 2025
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer.
CoRR, October, 2025
IEEE Trans. Circuits Syst. Video Technol., August, 2025
CoRR, July, 2025
CoRR, June, 2025
CoRR, May, 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction.
CoRR, May, 2025
From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval.
CoRR, April, 2025
CoRR, March, 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance.
CoRR, February, 2025
Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, 2025
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
VQAGuider: Guiding Multimodal Large Language Models to Answer Complex Video Questions.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025
2024
SNP-S<sup>3</sup>: Shared Network Pre-Training and Significant Semantic Strengthening for Various Video-Text Tasks.
IEEE Trans. Circuits Syst. Video Technol., April, 2024
CoRR, 2024
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval.
CoRR, 2024
SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks.
CoRR, 2024
M<sub>2</sub>-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining.
CoRR, 2024
Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition.
CoRR, 2024
M<sup>2</sup>-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024
EVE: Efficient Zero-Shot Text-Based Video Editing With Depth Map Guidance and Temporal Consistency Constraints.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
2023
CoRR, 2023
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input.
Proceedings of the Computer Vision - ECCV 2022, 2022
2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
2020
Automatic Car Damage Assessment System: Reading and Understanding Videos as Professional Insurance Inspectors.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2017
Non-Frontal Facial Expression Recognition Using a Depth-Patch Based Deep Neural Network.
J. Comput. Sci. Technol., 2017
2015
The Implementation of Hadoop-based Crawler System and Graphlite-based PageRank-Calculation In Search Engine.
CoRR, 2015