Zhixi Cai

Orcid: 0000-0001-7978-0860

According to our database¹, Zhixi Cai authored at least 23 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions.

[BibT_eX]

[DOI]

Zhixi Cai

Cristian Rojas Cardenas

Maria Garcia de la Banda

Hamid Rezatofighi

IEEE Robotics Autom. Lett., September, 2025

Explain Before You Answer: A Survey on Compositional Visual Reasoning.

[BibT_eX]

[DOI]

CoRR, August, 2025

JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics.

[BibT_eX]

[DOI]

CoRR, August, 2025

AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations.

[BibT_eX]

[DOI]

CoRR, July, 2025

M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information Extraction.

[BibT_eX]

[DOI]

CoRR, April, 2025

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning.

[BibT_eX]

[DOI]

CoRR, March, 2025

NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning.

[BibT_eX]

[DOI]

Zhixi Cai

Fucai Ke

Simindokht Jahangard

Maria Garcia de la Banda

Reza Haffari

Peter J. Stuckey

Hamid Rezatofighi

CoRR, February, 2025

Hier-SLAM: Scaling-Up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Multimodal Deepfake Generation and Detection: Challenges, Methods, and Future Directions.

[BibT_eX]

[DOI]

Abhinav Dhall

Zhixi Cai

Shreya Ghosh

Proceedings of the Companion Proceedings of the 27th International Conference on Multimodal Interaction, 2025

2024

Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting.

[BibT_eX]

[DOI]

CoRR, 2024

NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions.

[BibT_eX]

[DOI]

Zhixi Cai

Cristian Rojas Cardenas

Julian Gutierrez Santiago

Maria Garcia de la Banda

Hamid Rezatofighi

CoRR, 2024

MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

1M-Deepfakes Detection Challenge.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction, 2024

2023

<i>Glitch in the matrix</i>: A large scale benchmark for content driven audio-visual forgery detection and localization.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., November, 2023

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset.

[BibT_eX]

[DOI]

CoRR, 2023

Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification with Snoring Usecase.

[BibT_eX]

[DOI]

CoRR, 2023

"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization.

[BibT_eX]

[DOI]

CoRR, 2023

MARLIN: Masked Autoencoder for facial video Representation LearnINg.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization.

[BibT_eX]

[DOI]

CoRR, 2022

Zhixi Cai

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...