We stand with Ukraine

We stand with Ukraine

Xi Zhou

Orcid: 0000-0001-9943-5482

Affiliations:

CloudWalk Technology, Shanghai, China

According to our database¹, Xi Zhou authored at least 26 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

COST: Contrastive one-stage transformer for vision-language small object tracking.

[DOI]

,

,

,

,

,

,

,

Inf. Fusion, 2026

2025

How Far are Modern Trackers from UAV-Anti-UAV? A Million-Scale Benchmark and New Baseline.

[DOI]

,

,

,

,

,

,

,

CoRR, December, 2025

Boosting Nighttime UAV Tracking via Self-prompting Autoregressive Learning and a New Benchmark.

[DOI]

,

,

,

,

Proceedings of the Pattern Recognition and Computer Vision - 8th Chinese Conference, 2025

MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking.

[DOI]

,

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

High-compressed deepfake video detection with contrastive spatiotemporal distillation.

[DOI]

,

,

,

,

,

Neurocomputing, January, 2024

Multi-Level Signal Fusion for Enhanced Weakly-Supervised Audio-Visual Video Parsing.

[DOI]

,

,

,

IEEE Signal Process. Lett., 2024

Point Spatio-Temporal Pyramid Network for Point Cloud Video Understanding.

[DOI]

,

,

,

,

IEEE Signal Process. Lett., 2024

Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2.

[DOI]

,

,

,

,

,

CoRR, 2024

Awesome Multi-modal Object Tracking.

[DOI]

,

,

,

,

CoRR, 2024

WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

Video Moment Retrieval via Comprehensive Relation-Aware Network.

[DOI]

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., September, 2023

All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment.

[DOI]

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

AVForensics: Audio-driven Deepfake Video Detection with Masking Strategy in Self-supervision.

[DOI]

,

,

Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, 2023

Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Audio-Driven Talking Head Video Generation with Diffusion Model.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Exploiting Multi-modal Fusion for Robust Face Representation Learning with Missing Modality.

[DOI]

,

,

Proceedings of the Artificial Neural Networks and Machine Learning, 2023

PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Efficient Video Grounding With Which-Where Reading Comprehension.

[DOI]

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., 2022

You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos.

[DOI]

,

,

,

,

Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

HiCo: Hierarchical Contrastive Learning for Ultrasound Video Model Pretraining.

[DOI]

,

,

,

,

Proceedings of the Computer Vision - ACCV 2022, 2022

2021

Self-Guided Body Part Alignment With Relation Transformers for Occluded Person Re-Identification.

[DOI]

,

,

,

,

IEEE Signal Process. Lett., 2021

Skeleton-Based Action Recognition With Focusing-Diffusion Graph Convolutional Networks.

[DOI]

,

,

,

IEEE Signal Process. Lett., 2021

Relation-aware Video Reading Comprehension for Temporal Language Grounding.

[DOI]

,

,

,

,

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

Receptive Multi-Granularity Representation for Person Re-Identification.

[DOI]

,

,

,

,

IEEE Trans. Image Process., 2020

Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid Network.

[DOI]

,

,

,

,

,

,

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Focusing and Diffusion: Bidirectional Attentive Graph Convolutional Networks for Skeleton-based Action Recognition.

[DOI]

,

,

,

CoRR, 2019

Loading...