Chaoyou Fu
Orcid: 0000-0002-0079-7668
According to our database1,
Chaoyou Fu
authored at least 62 papers
between 2017 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs.
CoRR, May, 2025
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios.
CoRR, May, 2025
CoRR, May, 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model.
CoRR, May, 2025
CoRR, May, 2025
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models.
CoRR, April, 2025
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension.
CoRR, March, 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency.
CoRR, February, 2025
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy.
CoRR, February, 2025
CoRR, January, 2025
CoRR, January, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
2024
IEEE Trans. Circuits Syst. Video Technol., November, 2024
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption.
CoRR, 2024
CoRR, 2024
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning.
CoRR, 2024
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM.
CoRR, 2024
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
CoRR, 2024
CoRR, 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
CoRR, 2024
Sci. China Inf. Sci., 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
TGMAE: Self-supervised Micro-Expression Recognition with Temporal Gaussian Masked Autoencoder.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2023
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023
Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis.
IEEE Trans. Circuits Syst. Video Technol., March, 2023
Pattern Recognit., 2023
ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.
CoRR, 2023
CoRR, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
2022
Heterogeneous Face Recognition via Face Synthesis With Identity-Attribute Disentanglement.
IEEE Trans. Inf. Forensics Secur., 2022
IEEE Trans. Pattern Anal. Mach. Intell., 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
IEEE Trans. Inf. Forensics Secur., 2021
CoRR, 2021
CM-NAS: Rethinking Cross-Modality Neural Architectures for Visible-Infrared Person Re-Identification.
CoRR, 2021
CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the International IEEE Joint Conference on Biometrics, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
CoRR, 2019
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Neurons Merging Layer: Towards Progressive Redundancy Reduction for Deep Supervised Hashing.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
2017
Proceedings of the Advances in Image and Graphics Technologies - 12th Chinese conference, 2017