Mu Cai
Orcid: 0009-0008-7967-9752
According to our database1,
Mu Cai
authored at least 29 papers
between 2020 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios.
CoRR, July, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.
CoRR, July, 2025
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models.
CoRR, May, 2025
An Investigation on LLMs' Visual Understanding Ability Using SVG for Image-Text Bridging.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
2024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Investigating the Catastrophic Forgetting in Multimodal Large Language Model Fine-Tuning.
Proceedings of the Conference on Parsimony and Learning, 2024
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
2023
CoRR, 2023
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding.
CoRR, 2023
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023
A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
2021
Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
2020
Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving.
CoRR, 2020
A Game-Theoretic Strategy-Aware Interaction Algorithm with Validation on Real Traffic Data.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020