Zijia Zhao

Orcid: 0009-0000-7781-932X

According to our database1, Zijia Zhao authored at least 25 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation.
CoRR, June, 2025

Dynamic Caching Dependency-Aware Task Offloading in Mobile Edge Computing.
IEEE Trans. Computers, May, 2025

Kimi-VL Technical Report.
CoRR, April, 2025

Image Difference Grounding with Natural Language.
CoRR, April, 2025

Efficient Motion-Aware Video MLLM.
CoRR, March, 2025

ChatSearch: A dataset and a generative retrieval model for general conversational image retrieval.
Pattern Recognit., 2025

Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Exploring the Design Space of Visual Context Representation in Video MLLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficient Motion-Aware Video MLLM.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining.
CoRR, 2024

Towards Event-oriented Long Video Understanding.
CoRR, 2024

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs.
CoRR, 2024

Collaborative Training of Tiny-Large Vision Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VL-Mamba: Exploring State Space Models for Multimodal Learning.
Proceedings of the NeurIPS Efficient Natural Language and Speech Processing Workshop, 2024

SC- Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

OneDiff: A Generalist Model for Image Difference Captioning.
Proceedings of the Computer Vision - ACCV 2024, 2024

2023
A Digital Twin-Assisted Intelligent Partial Offloading Approach for Vehicular Edge Computing.
IEEE J. Sel. Areas Commun., November, 2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst.
CoRR, 2023

MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Snake-inspired Swarm Robot Design for Distributed Underwater Search and Rescue.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2023

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning.
CoRR, 2022

2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation.
CoRR, 2021

MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021


  Loading...