Gengyuan Zhang

According to our database¹, Gengyuan Zhang authored at least 20 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

AViLA: Asynchronous Vision-Language Agent for Streaming Multimodal Data Interaction.

[BibT_eX]

[DOI]

CoRR, June, 2025

My Answer Is NOT 'Fair': Mitigating Social Bias in Vision-Language Models via Fair and Biased Residuals.

[BibT_eX]

[DOI]

CoRR, May, 2025

Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs.

[BibT_eX]

[DOI]

CoRR, February, 2025

CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Perceive. Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Localizing Events in Videos with Multimodal Queries.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Multimodal Pragmatic Jailbreak on Text-to-image Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries.

[BibT_eX]

[DOI]

CoRR, 2024

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Multimodal Pragmatic Jailbreak on Text-to-image Models.

[BibT_eX]

[DOI]

CoRR, 2024

Localizing Events in Videos with Multimodal Queries.

[BibT_eX]

[DOI]

CoRR, 2024

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

RPF-ELD: Regional Prior Fusion using Early and Late Distillation for Breast Cancer Recognition in Ultrasound Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2024

2023

SPOT! Revisiting Video-Language Models for Event Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-event Video-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Time-dependent Entity Embedding is not All You Need: A Re-evaluation of Temporal Knowledge Graph Completion Models under a Unified Framework.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Gengyuan Zhang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...