Chaoyou Fu

Orcid: 0000-0002-0079-7668

According to our database1, Chaoyou Fu authored at least 62 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Thyme: Think Beyond Images.
CoRR, August, 2025

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs.
CoRR, May, 2025

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios.
CoRR, May, 2025

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs.
CoRR, May, 2025

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model.
CoRR, May, 2025

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning.
CoRR, May, 2025

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models.
CoRR, April, 2025

Aligning Multimodal LLM with Human Preference: A Survey.
CoRR, March, 2025

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension.
CoRR, March, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.
CoRR, February, 2025

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency.
CoRR, February, 2025

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy.
CoRR, February, 2025

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her.
CoRR, January, 2025

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.
CoRR, January, 2025

Learning Interleaved Image-Text Comprehension in Vision-Language Large Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Dynamic Graph Memory Bank for Video Inpainting.
IEEE Trans. Circuits Syst. Video Technol., November, 2024

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption.
CoRR, 2024

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs.
CoRR, 2024

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs.
CoRR, 2024

MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning.
CoRR, 2024

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM.
CoRR, 2024

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
CoRR, 2024

VITA: Towards Open-Source Interactive Omni Multimodal LLM.
CoRR, 2024

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models.
CoRR, 2024

Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models.
CoRR, 2024

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
CoRR, 2024

Woodpecker: hallucination correction for multimodal large language models.
Sci. China Inf. Sci., 2024

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

TGMAE: Self-supervised Micro-Expression Recognition with Temporal Gaussian Masked Autoencoder.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Aligning and Prompting Everything All at Once for Universal Visual Perception.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Towards Lightweight Pixel-Wise Hallucination for Heterogeneous Face Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis.
IEEE Trans. Circuits Syst. Video Technol., March, 2023

Iterative embedding distillation for open world vehicle recognition.
Pattern Recognit., 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.
CoRR, 2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.
CoRR, 2023

A Survey on Multimodal Large Language Models.
CoRR, 2023

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models.
CoRR, 2023

CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Multi-modal Queried Object Detection in the Wild.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Heterogeneous Face Recognition via Face Synthesis With Identity-Attribute Disentanglement.
IEEE Trans. Inf. Forensics Secur., 2022

Deep momentum uncertainty hashing.
Pattern Recognit., 2022

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Rethinking Image Cropping: Exploring Diverse Compositions from Global Views.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
High-Fidelity Face Manipulation With Extreme Poses and Expressions.
IEEE Trans. Inf. Forensics Secur., 2021

Learning Causal Representation for Face Transfer across Large Appearance Gap.
CoRR, 2021

Everything's Talkin': Pareidolia Face Reenactment.
CoRR, 2021

CM-NAS: Rethinking Cross-Modality Neural Architectures for Visible-Infrared Person Re-Identification.
CoRR, 2021

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Self-Augmented Heterogeneous Face Recognition.
Proceedings of the International IEEE Joint Conference on Biometrics, 2021

Pareidolia Face Reenactment.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Information Bottleneck Disentanglement for Identity Swapping.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Cross-Spectral Face Hallucination via Disentangling Independent Factors.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Pose Agnostic Cross-spectral Hallucination via Disentangling Independent Factors.
CoRR, 2019

High Fidelity Face Manipulation with Extreme Pose and Expression.
CoRR, 2019

Dual Variational Generation for Low Shot Heterogeneous Face Recognition.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Neurons Merging Layer: Towards Progressive Redundancy Reduction for Deep Supervised Hashing.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

2017
Global Perception Feedback Convolutional Neural Networks.
Proceedings of the Advances in Image and Graphics Technologies - 12th Chinese conference, 2017


  Loading...