Gangyan Zeng

Orcid: 0000-0003-2696-8549

According to our database1, Gangyan Zeng authored at least 21 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective.
CoRR, August, 2025

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model.
ACM Trans. Multim. Comput. Commun. Appl., June, 2025

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding.
CoRR, June, 2025

VidText: Towards Comprehensive Evaluation for Video Text Understanding.
CoRR, May, 2025

CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Towards better video services: An EEG-based interpretable model for functional quality of experience evaluation.
Displays, 2024

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model.
CoRR, 2024

Show Exemplars and Tell Me What You See: In-Context Learning with Frozen Large Language Models for TextVQA.
Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024

Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Perception-Enhanced Generative Transformer for Key Information Extraction from Documents.
Proceedings of the Pattern Recognition - 27th International Conference, 2024

Improving Multimodal Rumor Detection via Dynamic Graph Modeling.
Proceedings of the Pattern Recognition - 27th International Conference, 2024

2023
Beyond OCR + VQA: Towards end-to-end reading and reasoning for robust and accurate textvqa.
Pattern Recognit., June, 2023

Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Filling in the Blank: Rationale-Augmented Prompt Tuning for TextVQA.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

2022
TextBlock: Towards Scene Text Spotting without Fine-grained Detection.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

2021
A Cost-Efficient Framework for Scene Text Detection in the Wild.
Proceedings of the PRICAI 2021: Trends in Artificial Intelligence, 2021

Beyond OCR + VQA: Involving OCR into the Flow for Robust and Accurate TextVQA.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2019
PororoGAN: An Improved Story Visualization Model on Pororo-SV Dataset.
Proceedings of the CSAI 2019: 2019 3rd International Conference on Computer Science and Artificial Intelligence, 2019


  Loading...