Zijia Zhao

Orcid: 0009-0000-7781-932X

According to our database1, Zijia Zhao authored at least 27 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
M<sup>3</sup>-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering.
CoRR, April, 2026

WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models.
CoRR, February, 2026

A Collaborative Caching and Offloading Approach for Vehicular Edge Computing.
IEEE Trans. Sustain. Comput., 2026

Dynamic priority-based area partitioning, trajectory planning, and task scheduling in computing-while-flying UAV networks.
Future Gener. Comput. Syst., 2026

2025
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation.
CoRR, June, 2025

Dynamic Caching Dependency-Aware Task Offloading in Mobile Edge Computing.
IEEE Trans. Computers, May, 2025

Image Difference Grounding with Natural Language.
CoRR, April, 2025

ChatSearch: A dataset and a generative retrieval model for general conversational image retrieval.
Pattern Recognit., 2025

Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Exploring the Design Space of Visual Context Representation in Video MLLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficient Motion-Aware Video MLLM.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining.
CoRR, 2024

Towards Event-oriented Long Video Understanding.
CoRR, 2024

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs.
CoRR, 2024

Collaborative Training of Tiny-Large Vision Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VL-Mamba: Exploring State Space Models for Multimodal Learning.
Proceedings of the NeurIPS Efficient Natural Language and Speech Processing Workshop, 2024

SC- Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

OneDiff: A Generalist Model for Image Difference Captioning.
Proceedings of the Computer Vision - ACCV 2024, 2024

2023
A Digital Twin-Assisted Intelligent Partial Offloading Approach for Vehicular Edge Computing.
IEEE J. Sel. Areas Commun., November, 2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst.
CoRR, 2023

MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Snake-inspired Swarm Robot Design for Distributed Underwater Search and Rescue.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2023

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning.
CoRR, 2022

2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation.
CoRR, 2021

MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021


  Loading...