Gengyuan Zhang

According to our database1, Gengyuan Zhang authored at least 20 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
AViLA: Asynchronous Vision-Language Agent for Streaming Multimodal Data Interaction.
CoRR, June, 2025

My Answer Is NOT 'Fair': Mitigating Social Bias in Vision-Language Models via Fair and Biased Residuals.
CoRR, May, 2025

Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs.
CoRR, February, 2025

CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Perceive. Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Localizing Events in Videos with Multimodal Queries.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Multimodal Pragmatic Jailbreak on Text-to-image Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries.
CoRR, 2024

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models.
CoRR, 2024

Multimodal Pragmatic Jailbreak on Text-to-image Models.
CoRR, 2024

Localizing Events in Videos with Multimodal Queries.
CoRR, 2024

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

RPF-ELD: Regional Prior Fusion using Early and Late Distillation for Breast Cancer Recognition in Ultrasound Images.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2024

2023
SPOT! Revisiting Video-Language Models for Event Understanding.
CoRR, 2023

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
CoRR, 2023

Multi-event Video-Text Retrieval.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering.
CoRR, 2022

2021
Time-dependent Entity Embedding is not All You Need: A Re-evaluation of Temporal Knowledge Graph Completion Models under a Unified Framework.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021


  Loading...