Zhixi Cai
Orcid: 0000-0001-7978-0860
According to our database1,
Zhixi Cai authored at least 30 papers
between 2022 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
CoRR, May, 2026
Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents.
CoRR, April, 2026
CoRR, March, 2026
CoRR, January, 2026
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
2025
Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video.
CoRR, November, 2025
NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions.
IEEE Robotics Autom. Lett., September, 2025
CoRR, August, 2025
M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information Extraction.
CoRR, April, 2025
AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025
MRAC 2025: 3rd International Workshop on Multimodal, Generative and Responsible Affective Computing.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025
Hier-SLAM: Scaling-Up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025
Multimodal Deepfake Generation and Detection: Challenges, Methods, and Future Directions.
Proceedings of the Companion Proceedings of the 27th International Conference on Multimodal Interaction, 2025
DWIM: Towards Tool-Aware Visual Reasoning via Discrepancy-Aware Workflow Generation & Instruct-Masking Tuning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
Naver: a Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
2024
Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting.
CoRR, 2024
NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions.
CoRR, 2024
MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction, 2024
2023
<i>Glitch in the matrix</i>: A large scale benchmark for content driven audio-visual forgery detection and localization.
Comput. Vis. Image Underst., November, 2023
Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification with Snoring Usecase.
CoRR, 2023
"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization.
CoRR, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization.
CoRR, 2022