Zhenzhen Hu

Multim. Syst., April, 2025

Adaptive Dual Video Summarization: From Dynamic Keyframes to Captions.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

EPDiff: Enhancing Prior-guided Diffusion model for Real-world Image Super-Resolution.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2025

Concept Drift Guided LayerNorm Tuning for Efficient Multimodal Metaphor Identification.

[BibT_eX]

[DOI]

Proceedings of the 2025 International Conference on Multimedia Retrieval, 2025

Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2025

Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Efficiently Gluing Pre-Trained Language and Vision Models for Image Captioning.

[BibT_eX]

[DOI]

ACM Trans. Intell. Syst. Technol., December, 2024

Exploring and exploiting model uncertainty for robust visual question answering.

[BibT_eX]

[DOI]

Multim. Syst., December, 2024

Math Word Problem Generation via Disentangled Memory Retrieval.

[BibT_eX]

[DOI]

ACM Trans. Knowl. Discov. Data, June, 2024

Embedded Heterogeneous Attention Transformer for Cross-Lingual Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval.

[BibT_eX]

[DOI]

CoRR, 2024

UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos.

[BibT_eX]

[DOI]

CoRR, 2024

Dual-Stream Keyframe Enhancement for Video Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024

2023

Efficient and self-adaptive rationale knowledge base for visual commonsense reasoning.

[BibT_eX]

[DOI]

Multim. Syst., October, 2023

A Text-Guided Generation and Refinement Model for Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning.

[BibT_eX]

[DOI]

CoRR, 2023

Grid Feature Jigsaw for Self-supervised Image Clustering.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2023

Dual Video Summarization: From Frames to Captions.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

CITE: Compact Interactive TransformEr for Multilingual Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Image and Graphics Processing, 2023

2022

Visual feature synthesis with semantic reconstructor for traditional and generalized zero-shot object classification.

[BibT_eX]

[DOI]

Int. J. Intell. Syst., 2022

Compact Bidirectional Transformer for Image Captioning.

[BibT_eX]

[DOI]

CoRR, 2022

Math Word Problem Generation with Memory Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 5th Chinese Conference, 2022

OCR-oriented Master Object for Text Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

2021

Adversarial co-distillation learning for image recognition.

[BibT_eX]

[DOI]

Pattern Recognit., 2021

Sequential image encoding for vision-to-language problems.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2021

Semi-Autoregressive Transformer for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

2020

The Balanced Loss Curriculum Learning.

[BibT_eX]

[DOI]

IEEE Access, 2020

WFN-PSC: weighted-fusion network with poly-scale convolution for image dehazing.

[BibT_eX]

[DOI]

Proceedings of the MMAsia 2020: ACM Multimedia Asia, 2020

A Text-Guided Graph Structure for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops, 2020

More Grounded Image Captioning by Distilling Image-Text Matching Model.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Quality-Aware Unpaired Image-to-Image Translation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2019

2018

Video Captioning Based on the Spatial-Temporal Saliency Tracing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

Semantic Image Inpainting with Progressive Generative Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Speeding-Up Age Estimation in Intelligent Demographics System via Network Optimization.

[BibT_eX]

[DOI]