Zhixi Cai

Orcid: 0000-0001-7978-0860

According to our database1, Zhixi Cai authored at least 19 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions.
IEEE Robotics Autom. Lett., September, 2025

AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations.
CoRR, July, 2025

M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information Extraction.
CoRR, April, 2025

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning.
CoRR, March, 2025

NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning.
CoRR, February, 2025

2024
Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting.
CoRR, 2024

NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions.
CoRR, 2024

MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

1M-Deepfakes Detection Challenge.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning.
Proceedings of the Computer Vision - ECCV 2024, 2024

JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit.
Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction, 2024

2023
<i>Glitch in the matrix</i>: A large scale benchmark for content driven audio-visual forgery detection and localization.
Comput. Vis. Image Underst., November, 2023

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset.
CoRR, 2023

Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification with Snoring Usecase.
CoRR, 2023

"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization.
CoRR, 2023

MARLIN: Masked Autoencoder for facial video Representation LearnINg.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization.
CoRR, 2022


  Loading...