We stand with Ukraine

We stand with Ukraine

Zhixi Cai

Orcid: 0000-0001-7978-0860

According to our database¹, Zhixi Cai authored at least 30 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

SignMAE: Segmentation-Driven Self-Supervised Learning for Sign Language Recognition.

[DOI]

,

,

CoRR, May, 2026

Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents.

[DOI]

,

,

,

,

Gholamreza Haffari

,

,

Hamid Rezatofighi

CoRR, April, 2026

VIEW2SPACE: Studying Multi-View Visual Reasoning from Sparse Observations.

[DOI]

,

,

,

,

,

,

Pari Delir Haghighi

,

Gholamreza Haffari

,

Hamid Rezatofighi

CoRR, March, 2026

MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning.

[DOI]

,

,

,

,

Maria Garcia de la Banda

,

Peter J. Stuckey

,

Hamid Rezatofighi

CoRR, January, 2026

DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors.

[DOI]

,

Hrishav Bakul Barua

,

Lucy M. Robertson-Bell

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics.

[DOI]

Simindokht Jahangard

,

Mehrzad Mohammadi

,

,

,

Hamid Rezatofighi

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video.

[DOI]

,

,

CoRR, November, 2025

NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions.

[DOI]

,

Cristian Rojas Cardenas

,

,

,

,

,

,

Mahsa Ghorbanali

,

,

,

Julian Gutierrez

,

Alexey Ignatiev

,

,

,

Peter J. Stuckey

,

Maria Garcia de la Banda

,

Hamid Rezatofighi

IEEE Robotics Autom. Lett., September, 2025

Explain Before You Answer: A Survey on Compositional Visual Reasoning.

[DOI]

,

,

,

,

,

,

,

,

Pari Delir Haghighi

,

Gholamreza Haffari

,

,

,

Hamid Rezatofighi

CoRR, August, 2025

M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information Extraction.

[DOI]

,

,

,

,

,

,

,

CoRR, April, 2025

AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations.

[DOI]

,

Kartik Kuckreja

,

,

Akanksha Chuchra

,

Muhammad Haris Khan

,

,

,

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

MRAC 2025: 3rd International Workshop on Multimodal, Generative and Responsible Affective Computing.

[DOI]

,

,

,

,

,

,

Björn W. Schuller

,

,

,

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Hier-SLAM: Scaling-Up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting.

[DOI]

,

,

,

,

Hamid Rezatofighi

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Multimodal Deepfake Generation and Detection: Challenges, Methods, and Future Directions.

[DOI]

,

,

Proceedings of the Companion Proceedings of the 27th International Conference on Multimodal Interaction, 2025

DWIM: Towards Tool-Aware Visual Reasoning via Discrepancy-Aware Workflow Generation & Instruct-Masking Tuning.

[DOI]

,

Vijay Kumar B. G

,

,

,

,

,

Pari Delir Haghighi

,

Hamid Rezatofighi

,

Manmohan Chandraker

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Naver: a Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning.

[DOI]

,

,

Simindokht Jahangard

,

Maria Garcia de la Banda

,

,

Peter J. Stuckey

,

Hamid Rezatofighi

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2024

Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting.

[DOI]

,

,

,

,

Hamid Rezatofighi

CoRR, 2024

NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions.

[DOI]

,

Cristian Rojas Cardenas

,

,

,

,

,

,

Mahsa Ghorbanali

,

,

,

Julian Gutierrez Santiago

,

Alexey Ignatiev

,

,

,

Peter J. Stuckey

,

Maria Garcia de la Banda

,

Hamid Rezatofighi

CoRR, 2024

MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing.

[DOI]

,

,

,

Dimitrios Kollias

,

,

Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

1M-Deepfakes Detection Challenge.

[DOI]

,

,

,

,

Dimitrios Kollias

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset.

[DOI]

,

,

Aman Pankaj Adatia

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning.

[DOI]

,

,

Simindokht Jahangard

,

,

Pari Delir Haghighi

,

Hamid Rezatofighi

Proceedings of the Computer Vision - ECCV 2024, 2024

JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups.

[DOI]

Simindokht Jahangard

,

,

,

Hamid Rezatofighi

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit.

[DOI]

,

,

,

,

,

,

Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction, 2024

2023

<i>Glitch in the matrix</i>: A large scale benchmark for content driven audio-visual forgery detection and localization.

[DOI]

,

,

,

,

,

Comput. Vis. Image Underst., November, 2023

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset.

[DOI]

,

,

Aman Pankaj Adatia

,

,

,

CoRR, 2023

Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification with Snoring Usecase.

[DOI]

,

Md. Rakibul Hasan

,

Pradyumna Agrawal

,

,

,

,

CoRR, 2023

"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization.

[DOI]

,

,

,

,

,

CoRR, 2023

MARLIN: Masked Autoencoder for facial video Representation LearnINg.

[DOI]

,

,

,

,

,

Hamid Rezatofighi

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization.

[DOI]

,

,

,

CoRR, 2022

Loading...