Qingpei Guo

Orcid: 0009-0001-0521-9664

According to our database1, Qingpei Guo authored at least 50 papers between 2015 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning.
CoRR, March, 2026

Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models.
CoRR, February, 2026

FlattenGPT: Depth Compression for Transformer with Layer Flattening.
CoRR, February, 2026

VaccineRAG: Boosting Multimodal Large Language Models' Immunity to Harmful RAG Samples.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

SCAN: Self-Calibrated AutoregressioN for High-Quality Visual Generation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs.
CoRR, December, 2025

OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs.
CoRR, November, 2025

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert.
CoRR, November, 2025

Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer.
CoRR, October, 2025

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval.
IEEE Trans. Circuits Syst. Video Technol., August, 2025

M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning.
CoRR, July, 2025

Ming-Omni: A Unified Multimodal Model for Perception and Generation.
CoRR, June, 2025

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models.
CoRR, May, 2025

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction.
CoRR, May, 2025

From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval.
CoRR, April, 2025

LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models.
CoRR, March, 2025

M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance.
CoRR, February, 2025

MedTransTab: Advancing Medical Cross-Table Tabular Data Generation.
Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, 2025

Unified Visual Generation via Next-Set Prediction in Continuous Domain.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VQAGuider: Guiding Multimodal Large Language Models to Answer Complex Video Questions.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Attributive Reasoning for Hallucination Diagnosis of Large Language Models.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
SNP-S<sup>3</sup>: Shared Network Pre-Training and Significant Semantic Strengthening for Various Video-Text Tasks.
IEEE Trans. Circuits Syst. Video Technol., April, 2024

Referencing Where to Focus: Improving VisualGrounding with Referential Query.
CoRR, 2024

HOTVCOM: Generating Buzzworthy Comments for Videos.
CoRR, 2024

Social Debiasing for Fair Multi-modal LLMs.
CoRR, 2024

Hummer: Towards Limited Competitive Preference Dataset.
CoRR, 2024

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval.
CoRR, 2024

SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks.
CoRR, 2024

M<sub>2</sub>-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining.
CoRR, 2024

Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition.
CoRR, 2024

M<sup>2</sup>-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

LoTLIP: Improving Language-Image Pre-training for Long Text Understanding.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Referencing Where to Focus: Improving Visual Grounding with Referential Query.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

EVE: Efficient Zero-Shot Text-Based Video Editing With Depth Map Guidance and Temporal Consistency Constraints.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

HOTVCOM: Generating Buzzworthy Comments for Videos.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Text as Image: Learning Transferable Adapter for Multi-Label Classification.
CoRR, 2023

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Temporal Sentence Grounding in Streaming Videos.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
LPSNet: A Lightweight Solution for Fast Panoptic Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Automatic Car Damage Assessment System: Reading and Understanding Videos as Professional Insurance Inspectors.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2017
Non-Frontal Facial Expression Recognition Using a Depth-Patch Based Deep Neural Network.
J. Comput. Sci. Technol., 2017

2015
The Implementation of Hadoop-based Crawler System and Graphlite-based PageRank-Calculation In Search Engine.
CoRR, 2015


  Loading...