Zijia Zhao

Orcid: 0009-0000-7781-932X

According to our database¹, Zijia Zhao authored at least 29 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

AITQE: An Adaptive Image-Text Quality Enhancer for Scalable MLLM Pretraining.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., June, 2026

M<sup>3</sup>-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering.

[BibT_eX]

[DOI]

CoRR, April, 2026

WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

A Collaborative Caching and Offloading Approach for Vehicular Edge Computing.

[BibT_eX]

[DOI]

IEEE Trans. Sustain. Comput., 2026

Dynamic priority-based area partitioning, trajectory planning, and task scheduling in computing-while-flying UAV networks.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2026

M³-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Dynamic Caching Dependency-Aware Task Offloading in Mobile Edge Computing.

[BibT_eX]

[DOI]

IEEE Trans. Computers, May, 2025

Image Difference Grounding with Natural Language.

[BibT_eX]

[DOI]

CoRR, April, 2025

ChatSearch: A dataset and a generative retrieval model for general conversational image retrieval.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Exploring the Design Space of Visual Context Representation in Video MLLMs.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficient Motion-Aware Video MLLM.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Event-oriented Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Collaborative Training of Tiny-Large Vision Language Models.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VL-Mamba: Exploring State Space Models for Multimodal Learning.

[BibT_eX]

[DOI]

Proceedings of the NeurIPS Efficient Natural Language and Speech Processing Workshop, 2024

SC- Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

OneDiff: A Generalist Model for Image Difference Captioning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

2023

A Digital Twin-Assisted Intelligent Partial Offloading Approach for Vehicular Edge Computing.

[BibT_eX]

[DOI]

Ahmed Yassin Al-Dubai

Zhiyuan Tan

Amir Hussain

IEEE J. Sel. Areas Commun., November, 2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst.

[BibT_eX]

[DOI]

CoRR, 2023

MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling.

[BibT_eX]

[DOI]

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Snake-inspired Swarm Robot Design for Distributed Underwater Search and Rescue.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2023

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

2021

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2021

MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Zijia Zhao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...