Davide Caffagni

Orcid: 0009-0002-3279-6480

According to our database1, Davide Caffagni authored at least 14 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large Language Models.
CoRR, December, 2025

ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering.
CoRR, November, 2025

Recurrence Meets Transformers for Universal Multimodal Retrieval.
CoRR, September, 2025

Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization.
CoRR, August, 2025

Augmenting and mixing Transformers with synthetic data for image captioning.
Image Vis. Comput., 2025

Benchmarking BERT-based Models for Latin: A Case Study on Biblical References in Ancient Christian Literature.
Proceedings of the 21st Conference on Information and Research science Connecting to Digital and Library science, 2025

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Generating Synthetic Data with Large Language Models for Low-Resource Sentence Retrieval.
Proceedings of the Linking Theory and Practice of Digital Libraries, 2025

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
The (R)Evolution of Multimodal Large Language Models: A Survey.
CoRR, 2024

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization.
Proceedings of the 35th British Machine Vision Conference, 2024

The Revolution of Multimodal Large Language Models: A Survey.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning.
Proceedings of the Image Analysis and Processing - ICIAP 2023, 2023


  Loading...