Zikai Song

Orcid: 0009-0006-6651-2027

According to our database1, Zikai Song authored at least 38 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
From Ambiguity to Verdict: A Semiotic-Grounded Multi-Perspective Agent for LLM Logical Reasoning.
CoRR, September, 2025

MVP: Winning Solution to SMP Challenge 2025 Video Track.
CoRR, July, 2025

HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction.
CoRR, July, 2025

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing.
CoRR, July, 2025

CA-Diff: Collaborative Anatomy Diffusion for Brain Tissue Segmentation.
CoRR, June, 2025

GA-S<sup>3</sup>: Comprehensive Social Network Simulation with Group Agents.
CoRR, June, 2025

Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps.
CoRR, May, 2025

EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation.
IEEE Multim., 2025

MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, 2025

Cross-Modality Masked Learning for Survival Prediction in ICI Treated NSCLC Patients.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, 2025

Optimized View and Geometry Distillation from Multi-view Diffuser.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

DEEM: Diffusion models serve as the eyes of large language models for image perception.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Ref-GS: Directional Factorization for 2D Gaussian Splatting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GA-S³: Comprehensive Social Network Simulation with Group Agents.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Temporal Coherent Object Flow for Multi-Object Tracking.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking.
CoRR, 2024

Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model.
CoRR, 2024

DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception.
CoRR, 2024

EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation.
CoRR, 2024

Coupled Mamba: Enhanced Multimodal Fusion with Coupled State Space Model.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Autogenic Language Embedding for Coherent Point Tracking.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Study Selectively: An Adaptive Knowledge Distillation based on a Voting Network for Heart Sound Classification.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Agnostic Feature Compression with Semantic Guided Channel Importance Analysis.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Weight Light, Hear Right: Heart Sound Classification with a Low-Complexity Model.
Proceedings of the 32nd European Signal Processing Conference, 2024

Progressive Text-to-Image Diffusion with Soft Latent Direction.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

DiffusionTrack: Diffusion Model for Multi-Object Tracking.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Optimized View and Geometry Distillation from Multi-view Diffuser.
CoRR, 2023

Fine-grained Appearance Transfer with Diffusion Models.
CoRR, 2023

Cutting Weights of Deep Learning Models for Heart Sound Classification: Introducing a Knowledge Distillation Approach.
Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

Compact Transformer Tracker with Correlative Masked Modeling.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Transformer Tracking with Cyclic Shifting Window Attention.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Distractor-Aware Tracker with a Domain-Special Optimized Benchmark for Soccer Player Tracking.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

2020
SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos.
Multim. Tools Appl., 2020

Fine-Grain Level Sports Video Search Engine.
Proceedings of the MultiMedia Modeling - 26th International Conference, 2020

2018
Comprehensive Dataset of Broadcast Soccer Videos.
Proceedings of the IEEE 1st Conference on Multimedia Information Processing and Retrieval, 2018


  Loading...