Qingpei Guo

Orcid: 0009-0001-0521-9664

According to our database¹, Qingpei Guo authored at least 50 papers between 2015 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning.

[BibT_eX]

[DOI]

CoRR, March, 2026

Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

FlattenGPT: Depth Compression for Transformer with Layer Flattening.

[BibT_eX]

[DOI]

CoRR, February, 2026

VaccineRAG: Boosting Multimodal Large Language Models' Immunity to Harmful RAG Samples.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

SCAN: Self-Calibrated AutoregressioN for High-Quality Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs.

[BibT_eX]

[DOI]

CoRR, December, 2025

OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs.

[BibT_eX]

[DOI]

CoRR, November, 2025

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert.

[BibT_eX]

[DOI]

CoRR, November, 2025

Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer.

[BibT_eX]

[DOI]

CoRR, October, 2025

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., August, 2025

M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning.

[BibT_eX]

[DOI]

CoRR, July, 2025

Ming-Omni: A Unified Multimodal Model for Perception and Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction.

[BibT_eX]

[DOI]

CoRR, May, 2025

From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval.

[BibT_eX]

[DOI]

CoRR, April, 2025

LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance.

[BibT_eX]

[DOI]

CoRR, February, 2025

MedTransTab: Advancing Medical Cross-Table Tabular Data Generation.

[BibT_eX]

[DOI]

Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, 2025

Unified Visual Generation via Next-Set Prediction in Continuous Domain.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VQAGuider: Guiding Multimodal Large Language Models to Answer Complex Video Questions.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Attributive Reasoning for Hallucination Diagnosis of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

SNP-S3: Shared Network Pre-Training and Significant Semantic Strengthening for Various Video-Text Tasks.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., April, 2024

Referencing Where to Focus: Improving VisualGrounding with Referential Query.

[BibT_eX]

[DOI]

CoRR, 2024

HOTVCOM: Generating Buzzworthy Comments for Videos.

[BibT_eX]

[DOI]

CoRR, 2024

Social Debiasing for Fair Multi-modal LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Hummer: Towards Limited Competitive Preference Dataset.

[BibT_eX]

[DOI]

CoRR, 2024

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval.

[BibT_eX]

[DOI]

CoRR, 2024

SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining.

[BibT_eX]

[DOI]

CoRR, 2024

Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

LoTLIP: Improving Language-Image Pre-training for Long Text Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Referencing Where to Focus: Improving Visual Grounding with Referential Query.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

EVE: Efficient Zero-Shot Text-Based Video Editing With Depth Map Guidance and Temporal Consistency Constraints.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

HOTVCOM: Generating Buzzworthy Comments for Videos.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Text as Image: Learning Transferable Adapter for Multi-Label Classification.

[BibT_eX]

[DOI]

CoRR, 2023

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Temporal Sentence Grounding in Streaming Videos.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input.

[BibT_eX]

[DOI]

Qingpei Guo

Kaisheng Yao

Wei Chu

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

LPSNet: A Lightweight Solution for Fast Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Automatic Car Damage Assessment System: Reading and Understanding Videos as Professional Insurance Inspectors.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2017

Non-Frontal Facial Expression Recognition Using a Depth-Patch Based Deep Neural Network.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2017

2015

The Implementation of Hadoop-based Crawler System and Graphlite-based PageRank-Calculation In Search Engine.

[BibT_eX]

[DOI]

Qingpei Guo

Chao Xu

Yang Song

CoRR, 2015

Qingpei Guo

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...