Lorenzo Baraldi

Multimodal Technol. Interact., December, 2023

Fully-attentive iterative networks for region-based controllable image and video captioning.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., December, 2023

Evaluating synthetic pre-Training for handwriting processing tasks.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., August, 2023

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates.

[BibT_eX]

[DOI]

Sensors, February, 2023

From Show to Tell: A Survey on Deep Learning-Based Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2023

Removing NSFW Concepts from Vision-and-Language Models for Text-to-Image Retrieval and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation.

[BibT_eX]

[DOI]

CoRR, 2023

Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-Class Explainable Unlearning for Image Classification via Weight Filtering.

[BibT_eX]

[DOI]

CoRR, 2023

Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images.

[BibT_eX]

[DOI]

CoRR, 2023

Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation.

[BibT_eX]

[DOI]

CoRR, 2023

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Where Research meets Industry: New Challenges and Opportunities at AImageLab.

[BibT_eX]

[DOI]

Proceedings of the Italia Intelligenza Artificiale, 2023

Embodied Agents for Efficient Exploration and Smart Scene Description.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Towards Explainable Navigation and Recounting.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2023, 2023

Unveiling the Impact of Image Transformations on Deepfake Detection: An Experimental Analysis.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2023, 2023

SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2023, 2023

Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2023, 2023

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Superpixel Positional Encoding to Improve ViT-based Semantic Segmentation Models.

[BibT_eX]

[DOI]

Proceedings of the 34th British Machine Vision Conference 2023, 2023

2022

Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2022

A computational approach for progressive architecture shrinkage in action recognition.

[BibT_eX]

[DOI]

Softw. Pract. Exp., 2022

Focus on Impact: Indoor Exploration With Intrinsic Motivation.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2022

Boosting modern and historical handwritten text recognition with deformable convolutions.

[BibT_eX]

[DOI]

Int. J. Document Anal. Recognit., 2022

Explaining transformer-based image captioning models: An empirical analysis.

[BibT_eX]

[DOI]

AI Commun., 2022

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Pattern Recognition, 2022

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition.

[BibT_eX]

[DOI]

Christopher Kermorvant

Proceedings of the 26th International Conference on Pattern Recognition, 2022

CaMEL: Mean Teacher Learning for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Pattern Recognition, 2022

Investigating Bidimensional Downsampling in Vision Transformer Models.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2022, 2022

Embodied Navigation at the Art Gallery.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2022, 2022

Dual-Branch Collaborative Transformer for Virtual Try-On.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

The Unreasonable Effectiveness of CLIP Features for Image Captioning: An Experimental Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Retrieval-Augmented Transformer for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the CBMI 2022: International Conference on Content-based Multimedia Indexing, Graz, Austria, September 14, 2022

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval.

[BibT_eX]

[DOI]

Proceedings of the CBMI 2022: International Conference on Content-based Multimedia Indexing, Graz, Austria, September 14, 2022

2021

Working Memory Connections for LSTM.

[BibT_eX]

[DOI]

Neural Networks, 2021

Video action detection by learning graph-based spatio-temporal interactions.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2021

Multimodal attention networks for low-level vision-and-language navigation.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2021

Universal Captioner: Long-Tail Vision-and-Language Model Training through Content-Style Separation.

[BibT_eX]

[DOI]

CoRR, 2021

From Show to Tell: A Survey on Image Captioning.

[BibT_eX]

[DOI]

CoRR, 2021

Learning to Select: A Fully Attentive Approach for Novel Object Captioning.

[BibT_eX]

[DOI]

Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Improving Indoor Semantic Segmentation with Boundary-Level Objectives.

[BibT_eX]

[DOI]

Roberto Amoroso

Proceedings of the Advances in Computational Intelligence, 2021

Estimating (and Fixing) the Effect of Face Obfuscation in Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Revisiting the Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Learning to Read L'Infinito: Handwritten Text Recognition with Synthetic Training Data.

[BibT_eX]

[DOI]

Silvia Cascianelli

Maria Ludovica Piazzi

Rosiana Schiuma

Proceedings of the Computer Analysis of Images and Patterns, 2021

Out of the Box: Embodied Navigation in the Real World.

[BibT_eX]

[DOI]

Proceedings of the Computer Analysis of Images and Patterns, 2021

Assessing the Role of Boundary-Level Objectives in Indoor Semantic Segmentation.

[BibT_eX]

[DOI]

Roberto Amoroso

Proceedings of the Computer Analysis of Images and Patterns, 2021

2020

Spaghetti Labeling: Directed Acyclic Graphs for Block-Based Connected Components Labeling.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Explaining digital humanities by aligning images and textual descriptions.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2020

A unified cycle-consistent neural model for text and image retrieval.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2020

Toward reliable experiments on the performance of Connected Components Labeling algorithms.

[BibT_eX]

[DOI]

J. Real Time Image Process., 2020

Inter-Homines: Distance-Based Risk Estimation for Human Safety.

[BibT_eX]

[DOI]

CoRR, 2020

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

RMS-Net: Regression and Masking for Soccer Event Spotting.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

A Novel Attention-based Aggregation Function to Combine Vision and Language.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Explore and Explain: Self-supervised Navigation and Recounting.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Meshed-Memory Transformer for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

AI4AR: An AI-Based Mobile Application for the Automatic Generation of AR Contents.

[BibT_eX]

[DOI]

Proceedings of the Augmented Reality, Virtual Reality, and Computer Graphics, 2020

2019

M-VAD names: a dataset for video captioning with naming.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2019

M<sup>2</sup>: Meshed-Memory Transformer for Image Captioning.

[BibT_eX]

[DOI]

CoRR, 2019

STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection.

[BibT_eX]

[DOI]

CoRR, 2019

Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

CoRR, 2019

Image-to-Image Translation to Unfold the Reality of Artworks: An Empirical Analysis.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2019, 2019

Artpedia: A New Visual-Semantic Dataset with Visual and Contextual Sentences in the Artistic Domain.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2019, 2019

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-To-Image Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

A Deep-learning-based approach to VM behavior Identification in Cloud Systems.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Cloud Computing and Services Science, 2019

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters.

[BibT_eX]

[DOI]

Proceedings of the 30th British Machine Vision Conference 2019, 2019

2018

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2018

Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2018

Attentive models in vision: Computing saliency maps in the deep learning era.

[BibT_eX]

[DOI]

Intelligenza Artificiale, 2018

Connected Components Labeling on DRAGs: Implementation and Reproducibility Notes.

[BibT_eX]

[DOI]

Proceedings of the Reproducible Research in Pattern Recognition, 2018

Automatic Image Cropping and Selection Using Saliency: An Application to Historical Manuscripts.

[BibT_eX]

[DOI]

Proceedings of the Digital Libraries and Multimedia Archives, 2018

A Hierarchical Quasi-Recurrent approach to Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2018

Connected Components Labeling on DRAGs.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Pattern Recognition, 2018

Aligning Text and Document Illustrations: Towards Visually Explainable Digital Humanities.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Pattern Recognition, 2018

What Was Monet Seeing While Painting? Translating Artworks to Photo-Realistic Images.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Towards Cycle-Consistent Models for Text and Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

SAM: Pushing the Limits of Saliency Prediction Models.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Recognizing and Presenting the Storytelling Video Structure With Deep Multimodal Networks.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2017

A Video Library System Using Scene Detection and Automatic Tagging.

[BibT_eX]

[DOI]

Proceedings of the Digital Libraries and Archives, 2017

Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild.

[BibT_eX]

[DOI]

Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

Visual saliency for image captioning in new multimedia services.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops, 2017

Towards Video Captioning with Naming: A Novel Dataset and a Multi-modal Approach.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Processing - ICIAP 2017, 2017

Hierarchical Boundary-Aware Neural Encoder for Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

NeuralStory: an Interactive Multimedia System for Video Indexing and Re-use.

[BibT_eX]

[DOI]

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, 2017

2016

Shot, Scene and Keyframe Ordering for Interactive Video Re-use.

[BibT_eX]

[DOI]

Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016), 2016

A Browsing and Retrieval System for Broadcast Videos using Scene Detection and Automatic Annotation.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Layout Analysis and Content Classification in Digitized Books.

[BibT_eX]

[DOI]

Proceedings of the Digital Libraries and Multimedia Archives, 2016

YACCLAB - Yet Another Connected Components Labeling Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Pattern Recognition, 2016

A deep multi-level network for saliency prediction.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Pattern Recognition, 2016

Historical document digitization through layout analysis and deep content classification.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Pattern Recognition, 2016

Context Change Detection for an Ultra-Low Power Low-Resolution Ego-Vision Imager.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

Multi-level Net: A Visual Saliency Prediction Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

Optimized Connected Components Labeling with Pixel Prediction.

[BibT_eX]

[DOI]

Federico Bolelli

Proceedings of the Advanced Concepts for Intelligent Vision Systems, 2016

2015

A Deep Siamese Network for Scene Detection in Broadcast Videos.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Analysis and Re-Use of Videos in Educational Digital Libraries with Automatic Scene Detection.

[BibT_eX]

[DOI]

Proceedings of the Digital Libraries on the Move, 2015

Scene segmentation using temporal clustering for accessing and re-using broadcast video.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

Measuring Scene Detection Performance.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Image Analysis - 7th Iberian Conference, 2015

Shot and Scene Detection via Hierarchical Clustering for Re-using Broadcast Video.

[BibT_eX]

[DOI]