Sara Sarto

Orcid: 0000-0003-1057-3374

According to our database1, Sara Sarto authored at least 24 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Few Channels Draw The Whole Picture: Revealing Massive Activations in Diffusion Transformers.
CoRR, May, 2026

RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models.
CoRR, April, 2026

Look Twice: Training-Free Evidence Highlighting in Multimodal Large Language Models.
CoRR, April, 2026

2025
Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large Language Models.
CoRR, December, 2025

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training.
Int. J. Comput. Vis., November, 2025

ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering.
CoRR, November, 2025

Recurrence Meets Transformers for Universal Multimodal Retrieval.
CoRR, September, 2025

Semantically Conditioned Prompts for Visual Recognition Under Missing Modality Scenarios.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Towards Retrieval-Augmented Architectures for Image Captioning.
ACM Trans. Multim. Comput. Commun. Appl., August, 2024

Video Surveillance and Privacy: A Solvable Paradox?
Computer, March, 2024

Multiclass Unlearning for Image Classification via Weight Filtering.
IEEE Intell. Syst., 2024

The (R)Evolution of Multimodal Large Language Models: A Survey.
CoRR, 2024

Unlearning Vision Transformers Without Retaining Data via Low-Rank Decompositions.
Proceedings of the Pattern Recognition - 27th International Conference, 2024

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues.
Proceedings of the Computer Vision - ECCV 2024, 2024

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

The Revolution of Multimodal Large Language Models: A Survey.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Multi-Class Explainable Unlearning for Image Classification via Weight Filtering.
CoRR, 2023

Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation.
CoRR, 2023

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Retrieval-Augmented Transformer for Image Captioning.
Proceedings of the CBMI 2022: International Conference on Content-based Multimedia Indexing, Graz, Austria, September 14, 2022


  Loading...