Chaoyou Fu

Orcid: 0000-0002-0079-7668

According to our database¹, Chaoyou Fu authored at least 71 papers between 2017 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity.

[BibT_eX]

[DOI]

CoRR, November, 2025

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting.

[BibT_eX]

[DOI]

CoRR, October, 2025

CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent.

[BibT_eX]

[DOI]

CoRR, October, 2025

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing.

[BibT_eX]

[DOI]

CoRR, September, 2025

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark.

[BibT_eX]

[DOI]

CoRR, September, 2025

BaseReward: A Strong Baseline for Multimodal Reward Model.

[BibT_eX]

[DOI]

CoRR, September, 2025

Thyme: Think Beyond Images.

[BibT_eX]

[DOI]

CoRR, August, 2025

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs.

[BibT_eX]

[DOI]

CoRR, May, 2025

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios.

[BibT_eX]

[DOI]

CoRR, May, 2025

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs.

[BibT_eX]

[DOI]

CoRR, May, 2025

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model.

[BibT_eX]

[DOI]

CoRR, May, 2025

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2025

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

Aligning Multimodal LLM with Human Preference: A Survey.

[BibT_eX]

[DOI]

CoRR, March, 2025

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension.

[BibT_eX]

[DOI]

CoRR, March, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.

[BibT_eX]

[DOI]

CoRR, February, 2025

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy.

[BibT_eX]

[DOI]

CoRR, February, 2025

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her.

[BibT_eX]

[DOI]

CoRR, January, 2025

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.

[BibT_eX]

[DOI]

CoRR, January, 2025

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Learning Interleaved Image-Text Comprehension in Vision-Language Large Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Dynamic Graph Memory Bank for Video Inpainting.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., November, 2024

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM.

[BibT_eX]

[DOI]

CoRR, 2024

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

[BibT_eX]

[DOI]

CoRR, 2024

VITA: Towards Open-Source Interactive Omni Multimodal LLM.

[BibT_eX]

[DOI]

CoRR, 2024

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models.

[BibT_eX]

[DOI]

CoRR, 2024

Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.

[BibT_eX]

[DOI]

CoRR, 2024

Woodpecker: hallucination correction for multimodal large language models.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

TGMAE: Self-supervised Micro-Expression Recognition with Temporal Gaussian Masked Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Aligning and Prompting Everything All at Once for Universal Visual Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Towards Lightweight Pixel-Wise Hallucination for Heterogeneous Face Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., March, 2023

Iterative embedding distillation for open world vehicle recognition.

[BibT_eX]

[DOI]

Pattern Recognit., 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.

[BibT_eX]

[DOI]

CoRR, 2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

A Survey on Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Multi-modal Queried Object Detection in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

Heterogeneous Face Recognition via Face Synthesis With Identity-Attribute Disentanglement.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2022

Deep momentum uncertainty hashing.

[BibT_eX]

[DOI]

Pattern Recognit., 2022

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Rethinking Image Cropping: Exploring Diverse Compositions from Global Views.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

High-Fidelity Face Manipulation With Extreme Poses and Expressions.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2021

Learning Causal Representation for Face Transfer across Large Appearance Gap.

[BibT_eX]

[DOI]

CoRR, 2021

Everything's Talkin': Pareidolia Face Reenactment.

[BibT_eX]

[DOI]

CoRR, 2021

CM-NAS: Rethinking Cross-Modality Neural Architectures for Visible-Infrared Person Re-Identification.

[BibT_eX]

[DOI]

CoRR, 2021

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Self-Augmented Heterogeneous Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the International IEEE Joint Conference on Biometrics, 2021

Pareidolia Face Reenactment.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Information Bottleneck Disentanglement for Identity Swapping.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Cross-Spectral Face Hallucination via Disentangling Independent Factors.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Pose Agnostic Cross-spectral Hallucination via Disentangling Independent Factors.

[BibT_eX]

[DOI]

CoRR, 2019

High Fidelity Face Manipulation with Extreme Pose and Expression.

[BibT_eX]

[DOI]

CoRR, 2019

Dual Variational Generation for Low Shot Heterogeneous Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Neurons Merging Layer: Towards Progressive Redundancy Reduction for Deep Supervised Hashing.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

2017

Global Perception Feedback Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Image and Graphics Technologies - 12th Chinese conference, 2017

Chaoyou Fu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...