Nicholas Moratelli

Orcid: 0000-0001-9362-5680

According to our database1, Nicholas Moratelli authored at least 16 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning.
CoRR, March, 2025

Causal Graphical Models for Vision-Language Compositional Understanding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization.
IEEE Intell. Syst., 2024

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis.
CoRR, 2024

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering.
CoRR, 2024

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training.
CoRR, 2024

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization.
CoRR, 2024

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs.
CoRR, 2024

The (R)Evolution of Multimodal Large Language Models: A Survey.
CoRR, 2024

Fluent and Accurate Image Captioning with a Self-trained Reward Model.
Proceedings of the Pattern Recognition - 27th International Conference, 2024

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis.
Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization.
Proceedings of the 35th British Machine Vision Conference, 2024

The Revolution of Multimodal Large Language Models: A Survey.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates.
Sensors, February, 2023


  Loading...