Marcella Cornia

Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

2025

Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Mark Granroth-Wilding

CoRR, December, 2025

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., November, 2025

ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering.

[BibT_eX]

[DOI]

CoRR, November, 2025

Recurrence Meets Transformers for Universal Multimodal Retrieval.

[BibT_eX]

[DOI]

CoRR, September, 2025

Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization.

[BibT_eX]

[DOI]

CoRR, August, 2025

RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors.

[BibT_eX]

[DOI]

CoRR, June, 2025

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals.

[BibT_eX]

[DOI]

CoRR, May, 2025

Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack.

[BibT_eX]

[DOI]

CoRR, May, 2025

Parents and Children: Distinguishing Multimodal Deepfakes from Natural Images.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., January, 2025

Augmenting and mixing Transformers with synthetic data for image captioning.

[BibT_eX]

[DOI]

Image Vis. Comput., 2025

Learning to mask and permute visual tokens for Vision Transformer pre-training.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2025

Semantically Conditioned Prompts for Visual Recognition Under Missing Modality Scenarios.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Benchmarking BERT-based Models for Latin: A Case Study on Biblical References in Ancient Christian Literature.

[BibT_eX]

[DOI]

Proceedings of the 21st Conference on Information and Research science Connecting to Digital and Library science, 2025

Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2025

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives.

[BibT_eX]

[DOI]

Sara Sarto

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

MISSRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Generating Synthetic Data with Large Language Models for Low-Resource Sentence Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Linking Theory and Practice of Digital Libraries, 2025

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Towards Retrieval-Augmented Architectures for Image Captioning.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., August, 2024

Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2024

Video Surveillance and Privacy: A Solvable Paradox?

[BibT_eX]

[DOI]

Computer, March, 2024

Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2024

Multiclass Unlearning for Image Classification via Weight Filtering.

[BibT_eX]

[DOI]

IEEE Intell. Syst., 2024

Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization.

[BibT_eX]

[DOI]

IEEE Intell. Syst., 2024

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities.

[BibT_eX]

[DOI]

CoRR, 2024

The (R)Evolution of Multimodal Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Trends, Applications, and Challenges in Human Attention Modelling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Unlearning Vision Transformers Without Retaining Data via Low-Rank Decompositions.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition - 27th International Conference, 2024

Fluent and Accurate Image Captioning with a Self-trained Reward Model.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition - 27th International Conference, 2024

Adapt to Scarcity: Few-Shot Deepfake Detection via Low-Rank Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition - 27th International Conference, 2024

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Pixels of Faith: Exploiting Visual Saliency to Detect Religious Image Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization.

[BibT_eX]

[DOI]

Proceedings of the 35th British Machine Vision Conference, 2024

The Revolution of Multimodal Large Language Models: A Survey.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Fully-attentive iterative networks for region-based controllable image and video captioning.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., December, 2023

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates.

[BibT_eX]

[DOI]

Sensors, February, 2023

Computer Vision in Human Analysis: From Face and Body to Clothes.

[BibT_eX]

[DOI]

Sensors, 2023

From Show to Tell: A Survey on Deep Learning-Based Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2023

Removing NSFW Concepts from Vision-and-Language Models for Text-to-Image Retrieval and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-Class Explainable Unlearning for Image Classification via Weight Filtering.

[BibT_eX]

[DOI]

CoRR, 2023

Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation.

[BibT_eX]

[DOI]

CoRR, 2023

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Where Research meets Industry: New Challenges and Opportunities at AImageLab.

[BibT_eX]

[DOI]

Proceedings of the Italia Intelligenza Artificiale, 2023

Embodied Agents for Efficient Exploration and Smart Scene Description.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Towards Explainable Navigation and Recounting.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2023, 2023

Unveiling the Impact of Image Transformations on Deepfake Detection: An Experimental Analysis.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2023, 2023

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2023, 2023

SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2023, 2023

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Transform, Warp, and Dress: A New Transformation-guided Model for Virtual Try-on.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2022

Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2022

Focus on Impact: Indoor Exploration With Intrinsic Motivation.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2022

Boosting modern and historical handwritten text recognition with deformable convolutions.

[BibT_eX]

[DOI]

Int. J. Document Anal. Recognit., 2022

Explaining transformer-based image captioning models: An empirical analysis.

[BibT_eX]

[DOI]

AI Commun., 2022

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Pattern Recognition, 2022

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition.

[BibT_eX]

[DOI]

Christopher Kermorvant

Proceedings of the 26th International Conference on Pattern Recognition, 2022

CaMEL: Mean Teacher Learning for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Pattern Recognition, 2022

Investigating Bidimensional Downsampling in Vision Transformer Models.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2022, 2022

Embodied Navigation at the Art Gallery.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2022, 2022

Dress Code: High-Resolution Multi-category Virtual Try-On.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Dual-Branch Collaborative Transformer for Virtual Try-On.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

The Unreasonable Effectiveness of CLIP Features for Image Captioning: An Experimental Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Retrieval-Augmented Transformer for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the CBMI 2022: International Conference on Content-based Multimedia Indexing, Graz, Austria, September 14, 2022

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval.

[BibT_eX]

[DOI]

Proceedings of the CBMI 2022: International Conference on Content-based Multimedia Indexing, Graz, Austria, September 14, 2022

2021

Working Memory Connections for LSTM.

[BibT_eX]

[DOI]

Neural Networks, 2021

Multimodal attention networks for low-level vision-and-language navigation.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2021

Universal Captioner: Long-Tail Vision-and-Language Model Training through Content-Style Separation.

[BibT_eX]

[DOI]

CoRR, 2021

From Show to Tell: A Survey on Image Captioning.

[BibT_eX]

[DOI]

CoRR, 2021

Learning to Select: A Fully Attentive Approach for Novel Object Captioning.

[BibT_eX]

[DOI]

Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

FashionSearch++: Improving Consumer-to-Shop Clothes Retrieval with Hard Negatives.

[BibT_eX]

[DOI]

Davide Morelli

Proceedings of the 11th Italian Information Retrieval Workshop 2021, 2021

Revisiting the Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Learning to Read L'Infinito: Handwritten Text Recognition with Synthetic Training Data.

[BibT_eX]

[DOI]

Silvia Cascianelli

Maria Ludovica Piazzi

Rosiana Schiuma

Proceedings of the Computer Analysis of Images and Patterns, 2021

Out of the Box: Embodied Navigation in the Real World.

[BibT_eX]

[DOI]

Proceedings of the Computer Analysis of Images and Patterns, 2021

2020

Imparare a descrivere gli oggetti salienti presenti nelle immagini tramite la visione e il linguaggio.

[BibT_eX]

[DOI]

PhD thesis, 2020

Explaining digital humanities by aligning images and textual descriptions.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2020

A unified cycle-consistent neural model for text and image retrieval.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2020

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

A Novel Attention-based Aggregation Function to Combine Vision and Language.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

VITON-GT: An Image-based Virtual Try-On Model with Geometric Transformations.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Explore and Explain: Self-supervised Navigation and Recounting.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Meshed-Memory Transformer for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

M-VAD names: a dataset for video captioning with naming.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2019

M<sup>2</sup>: Meshed-Memory Transformer for Image Captioning.

[BibT_eX]

[DOI]

CoRR, 2019

Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

CoRR, 2019

Image-to-Image Translation to Unfold the Reality of Artworks: An Empirical Analysis.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2019, 2019

Artpedia: A New Visual-Semantic Dataset with Visual and Contextual Sentences in the Artistic Domain.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2019, 2019

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-To-Image Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions.

[BibT_eX]

[DOI]