We stand with Ukraine

We stand with Ukraine

Alessio Brutti

Orcid: 0000-0003-4146-3071

According to our database¹, Alessio Brutti authored at least 94 papers between 2005 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

EgoAdapt: Enhancing Robustness in Egocentric Interactive Speaker Detection Under Missing Modalities.

[DOI]

,

,

,

CoRR, March, 2026

Distillation-based Layer Dropping (DLD): Effective End-to-end Framework for Dynamic Speech Networks.

[DOI]

,

Daniele Falavigna

,

,

,

,

CoRR, January, 2026

2025

MLMA: Towards Multilingual ASR With Mamba-based Architectures.

[DOI]

Mohamed Nabih Ali

,

Daniele Falavigna

,

CoRR, October, 2025

The Eloquence team submission for task 1 of MLC-SLM challenge.

[DOI]

Lorenzo Concina

,

,

,

Marco Matassoni

,

CoRR, July, 2025

Input Conditioned Layer Dropping in Speech Foundation Models.

[DOI]

,

Daniele Falavigna

,

Proceedings of the 35th IEEE International Workshop on Machine Learning for Signal Processing, 2025

The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence.

[DOI]

,

,

Luisa Bentivogli

,

,

,

Roberto Gretter

,

Marco Matassoni

,

,

Proceedings of the 22nd International Conference on Spoken Language Translation, 2025

Automatic detection of speech sound disorders in German-speaking children: augmenting the data with typically developed speech.

[DOI]

Darline Monika Marx

,

Marco Matassoni

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Granary: Speech Recognition and Translation Dataset in 25 European Languages.

[DOI]

Nithin Rao Koluguri

,

,

George Zelenfroynd

,

,

,

Sofia Kostandian

,

,

,

Jagadeesh Balam

,

Vitaly Lavrukhin

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

An Effective Training Framework for Light-Weight Automatic Speech Recognition Models.

[DOI]

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages.

[DOI]

,

Marco Matassoni

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach.

[DOI]

Umberto Cappellazzo

,

,

Stavros Petridis

,

Daniele Falavigna

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Large Language Models are Strong Audio-Visual Speech Recognition Learners.

[DOI]

Umberto Cappellazzo

,

,

,

,

Stavros Petridis

,

Daniele Falavigna

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

EFL-PEFT: A communication Efficient Federated Learning framework using PEFT sparsification for ASR.

[DOI]

Mohamed Nabih Ali

,

Daniele Falavigna

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Splitformer: An Improved Early-Exit Architecture for Automatic Speech Recognition on Edge Devices.

[DOI]

Maxence Lasbordes

,

Daniele Falavigna

,

Proceedings of the 33rd European Signal Processing Conference, 2025

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian.

[DOI]

,

,

Luisa Bentivogli

,

,

,

Roberto Gretter

,

Marco Matassoni

,

,

Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025), 2025

2024

End-to-end integration of speech separation and voice activity detection for low-latency diarization of telephone conversations.

[DOI]

Giovanni Morrone

,

Samuele Cornell

,

,

,

,

Stefano Squartini

Speech Commun., 2024

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages.

[DOI]

,

,

Luisa Bentivogli

,

,

,

Roberto Gretter

,

Marco Matassoni

,

,

CoRR, 2024

Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients.

[DOI]

Mohamed Nabih Ali

,

,

Daniele Falavigna

CoRR, 2024

Detection and Classification of Cardiovascular Diseases Using Neural Networks.

[DOI]

Bastián Estay Zamorano

,

Ali Dehghan Firoozabadi

,

,

,

David Zabala-Blanco

,

Pablo Palacios Játiva

,

Cesar A. Azurdia-Meza

Proceedings of the Signal Processing: Algorithms, 2024

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers.

[DOI]

Umberto Cappellazzo

,

Daniele Falavigna

,

,

Mirco Ravanelli

Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters.

[DOI]

Umberto Cappellazzo

,

Daniele Falavigna

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Training Early-Exit Architectures for Automatic Speech Recognition: Fine-Tuning Pre-Trained Models or Training from Scratch.

[DOI]

George August Wright

,

Umberto Cappellazzo

,

,

,

Lucas Ondel Yang

,

Daniele Falavigna

,

Mohamed Nabih Ali

,

Proceedings of the IEEE International Conference on Acoustics, 2024

LDASR: An Experimental Study on Layer Drop Using Conformer-Based Architecture.

[DOI]

,

,

Daniele Falavigna

Proceedings of the 32nd European Signal Processing Conference, 2024

MOSEL: 950, 000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages.

[DOI]

,

,

Luisa Bentivogli

,

,

,

Roberto Gretter

,

Marco Matassoni

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Continual Contrastive Spoken Language Understanding.

[DOI]

Umberto Cappellazzo

,

,

,

Daniele Falavigna

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings.

[DOI]

,

Samuele Cornell

,

Giovanni Morrone

,

,

,

Stefano Squartini

Comput. Speech Lang., July, 2023

Direct enhancement of pre-trained speech embeddings for speech processing in noisy conditions.

[DOI]

Mohamed Nabih Ali

,

,

Daniele Falavigna

Comput. Speech Lang., June, 2023

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices.

[DOI]

George August Wright

,

Umberto Cappellazzo

,

,

,

Lucas Ondel Yang

,

Daniele Falavigna

,

CoRR, 2023

Improving the Intent Classification accuracy in Noisy Environment.

[DOI]

Mohamed Nabih Ali

,

,

Daniele Falavigna

CoRR, 2023

Scaling strategies for on-device low-complexity source separation with Conv-Tasnet.

[DOI]

Mohamed Nabih Ali

,

Francesco Paissan

,

Daniele Falavigna

,

CoRR, 2023

Towards Speaker-Independent Voice Conversion for Improving Dysarthric Speech Intelligibility.

[DOI]

,

Marco Matassoni

,

Gianluca Esposito

,

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding.

[DOI]

Umberto Cappellazzo

,

,

Daniele Falavigna

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding.

[DOI]

Umberto Cappellazzo

,

Daniele Falavigna

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Audio-Visual Tracking of Concurrent Speakers.

[DOI]

,

,

,

Maurizio Omologo

,

Andrea Cavallaro

IEEE Trans. Multim., 2022

Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models.

[DOI]

Mohamed Nabih Ali

,

Daniele Falavigna

,

Sensors, 2022

Exploring the Joint Use of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding.

[DOI]

Umberto Cappellazzo

,

Daniele Falavigna

,

CoRR, 2022

Low-Latency Speech Separation Guided Diarization for Telephone Conversations.

[DOI]

Giovanni Morrone

,

Samuele Cornell

,

,

,

,

,

Stefano Squartini

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Using Seq2seq voice conversion with pre-trained representations for audio anonymization: experimental insights.

[DOI]

,

Marco Matassoni

,

Proceedings of the IEEE International Smart Cities Conference, 2022

Enhancing Embeddings for Speech Classification in Noisy Conditions.

[DOI]

Mohamed Nabih Ali

,

,

Daniele Falavigna

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

[DOI]

,

,

Andrea Cavallaro

Proceedings of the IEEE International Conference on Acoustics, 2022

Scalable Neural Architectures for End-to-End Environmental Sound Classification.

[DOI]

Francesco Paissan

,

Alberto Ancilotto

,

,

Elisabetta Farella

Proceedings of the IEEE International Conference on Acoustics, 2022

End-to-End Low Resource Keyword Spotting Through Character Recognition and Beam-Search Re-Scoring.

[DOI]

Ephrem Tibebe Mekonnen

,

,

Daniele Falavigna

Proceedings of the IEEE International Conference on Acoustics, 2022

Optimizing PhiNet architectures for the detection of urban sounds on low-end devices.

[DOI]

,

Francesco Paissan

,

Alberto Ancilotto

,

Elisabetta Farella

Proceedings of the 30th European Signal Processing Conference, 2022

Low-Complexity Acoustic Scene Classification in DCASE 2022 Challenge.

[DOI]

Irene Martín-Morató

,

Francesco Paissan

,

Alberto Ancilotto

,

,

Annamaria Mesaros

,

Elisabetta Farella

,

,

Tuomas Virtanen

Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

2021

Learning to Rank Microphones for Distant Speech Recognition.

[DOI]

Samuele Cornell

,

,

Marco Matassoni

,

Stefano Squartini

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Robust Latent Representations Via Cross-Modal Translation and Alignment.

[DOI]

,

,

Andrea Cavallaro

Proceedings of the IEEE International Conference on Acoustics, 2021

A Speech Enhancement Front-End for Intent Classification in Noisy Environments.

[DOI]

Mohamed Nabih Ali

,

Veronica Juliana Schmalz

,

,

Daniele Falavigna

Proceedings of the 29th European Signal Processing Conference, 2021

Automatic Assessment of English CEFR Levels Using BERT Embeddings.

[DOI]

Veronica Juliana Schmalz

,

Proceedings of the Eighth Italian Conference on Computational Linguistics, 2021

2020

Compact Recurrent Neural Networks for Acoustic Event Detection on Low-Energy Low-Complexity Platforms.

[DOI]

Gianmarco Cerutti

,

,

,

Elisabetta Farella

IEEE J. Sel. Top. Signal Process., 2020

Supervised Online Diarization with Sample Mean Loss for Multi-Domain Data.

[DOI]

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speech Enhancement Using Dilated Wave-U-Net: an Experimental Analysis.

[DOI]

Mohamed Nabih Ali

,

,

Daniele Falavigna

Proceedings of the 27th Conference of Open Innovations Association, 2020

2019

Multi-Speaker Tracking From an Audio-Visual Sensing Device.

[DOI]

,

,

,

Maurizio Omologo

,

Andrea Cavallaro

IEEE Trans. Multim., 2019

ConflictNET: End-to-End Learning for Speech-Based Conflict Intensity Estimation.

[DOI]

,

,

Andrea Cavallaro

IEEE Signal Process. Lett., 2019

The Speed Submission to DIHARD II: Contributions & Lessons Learned.

[DOI]

,

,

Samuele Cornell

,

,

Sunit Sivasankaran

,

,

Pavel Korshunov

,

,

,

Emmanuel Vincent

,

Nicholas W. D. Evans

,

Sébastien Marcel

,

Stefano Squartini

,

CoRR, 2019

LOCATA challenge: speaker localization with a planar array.

[DOI]

,

Andrea Cavallaro

,

,

Maurizio Omologo

CoRR, 2019

Neural Network Distillation on IoT Platforms for Sound Event Detection.

[DOI]

Gianmarco Cerutti

,

,

,

Elisabetta Farella

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Accurate Target Annotation in 3D from Multimodal Streams.

[DOI]

,

,

Alessio Xompero

,

,

Maurizio Omologo

,

Andrea Cavallaro

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

3D Mouth Tracking from a Compact Microphone Array Co-Located with a camera.

[DOI]

,

Alessio Xompero

,

Andrea Cavallaro

,

,

,

Maurizio Omologo

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Online Cross-Modal Adaptation for Audio-Visual Person Identification With Wearable Cameras.

[DOI]

,

Andrea Cavallaro

IEEE Trans. Hum. Mach. Syst., 2017

Optimizing DNN Adaptation for Recognition of Enhanced Speech.

[DOI]

Marco Matassoni

,

,

Daniele Falavigna

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Unsupervised Cross-Modal Deep-Model Adaptation for Audio-Visual Re-identification with Wearable Cameras.

[DOI]

,

Andrea Cavallaro

Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

3D audio-visual speaker tracking with an adaptive particle filter.

[DOI]

,

,

Maurizio Omologo

,

Andrea Cavallaro

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

On the relationship between Early-to-Late Ratio of Room Impulse Responses and ASR performance in reverberant environments.

[DOI]

,

Marco Matassoni

Speech Commun., 2016

Multi-channel i-vector combination for robust speaker verification in multi-room domestic environments.

[DOI]

,

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Increasing the environment-awareness of rake beamforming for directive acoustic sources.

[DOI]

,

Proceedings of the IEEE International Workshop on Acoustic Signal Enhancement, 2016

A Phase-Based Time-Frequency Masking for Multi-Channel Speech Enhancement in Domestic Environments.

[DOI]

,

Antigoni Tsiami

,

Athanasios Katsamanis

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Multi-channel speaker verification based on total variability modelling.

[DOI]

Maria Joana Correia

,

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multi-room speech activity detection using a distributed microphone network in domestic environments.

[DOI]

Panagiotis Giannoulis

,

,

Marco Matassoni

,

,

Athanasios Katsamanis

,

,

Gerasimos Potamianos

,

Proceedings of the 23rd European Signal Processing Conference, 2015

2014

Acoustic modeling based on early-to-late reverberation ratio for robust ASR.

[DOI]

Marco Matassoni

,

,

Piergiorgio Svaizer

Proceedings of the 14th International Workshop on Acoustic Signal Enhancement, 2014

On the use of Early-To-Late Reverberation ratio for ASR in reverberant environments.

[DOI]

,

Marco Matassoni

Proceedings of the IEEE International Conference on Acoustics, 2014

A speech event detection and localization task for multiroom environments.

[DOI]

,

Mirco Ravanelli

,

Piergiorgio Svaizer

,

Maurizio Omologo

Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

2013

An environment aware ML estimation of acoustic radiation pattern with distributed microphone pairs.

[DOI]

,

Maurizio Omologo

,

Piergiorgio Svaizer

Signal Process., 2013

Tracking of multidimensional TDOA for multiple sources with distributed microphone pairs.

[DOI]

,

Francesco Nesta

Comput. Speech Lang., 2013

Geometric contamination for GMM/UBM speaker verification in reverberant environments.

[DOI]

,

Maurizio Omologo

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

2012

Maximum a Posteriori Trajectory Estimation for Acoustic Source Tracking.

[DOI]

,

Maurizio Omologo

,

Piergiorgio Svaizer

Proceedings of the IWAENC 2012 - International Workshop on Acoustic Signal Enhancement, Proceedings, RWTH Aachen University, Germany, September 4th, 2012

Environment aware estimation of the orientation of acoustic sources using a line array.

[DOI]

Piergiorgio Svaizer

,

,

Maurizio Omologo

Proceedings of the 20th European Signal Processing Conference, 2012

2011

The SCENIC Project: Environment-aware Sound Sensing and Rendering.

[DOI]

,

Fabio Antonacci

,

Paolo Bestagini

,

,

Antonio Canclini

,

Luca Cristoforetti

,

Emanuël Anco Peter Habets

,

Walter Kellermann

,

Konrad Kowalczyk

,

Anthony Lombard

,

,

,

Patrick A. Naylor

,

Maurizio Omologo

,

Rudolf Rabenstein

,

,

Piergiorgio Svaizer

,

Mark R. P. Thomas

Proceedings of the 2nd European Future Technologies Conference and Exhibition, 2011

Sub-band spectral variance feature for noise robust ASR.

[DOI]

Hari Krishna Maganti

,

,

Marco Matassoni

,

Proceedings of the 19th European Signal Processing Conference, 2011

Inference of acoustic source directivity using environment awareness.

[DOI]

,

Maurizio Omologo

,

Piergiorgio Svaizer

Proceedings of the 19th European Signal Processing Conference, 2011

Multiple source tracking by sequential posterior kernel density estimation through GSCT.

[DOI]

,

Francesco Nesta

Proceedings of the 19th European Signal Processing Conference, 2011

2010

Multiple Source Localization Based on Acoustic Map De-Emphasis.

[DOI]

,

Maurizio Omologo

,

Piergiorgio Svaizer

EURASIP J. Audio Speech Music. Process., 2010

A joint particle filter to track the position and head orientation of people using audio visual cues.

[DOI]

,

Proceedings of the 18th European Signal Processing Conference, 2010

2009

Person Tracking.

[DOI]

,

Rainer Stiefelhagen

,

Aristodemos Pnevmatikakis

,

,

,

,

Gerasimos Potamianos

Proceedings of the Computers in the Human Interaction Loop, 2009

A sequential Monte Carlo approach for tracking of overlapping acoustic sources.

[DOI]

,

Maurizio Omologo

,

Piergiorgio Svaizer

Proceedings of the 17th European Signal Processing Conference, 2009

Acoustic Based Surveillance System for Intrusion Detection.

[DOI]

Christian Zieger

,

,

Piergiorgio Svaizer

Proceedings of the Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, 2009

2008

WOZ Acoustic Data Collection for Interactive TV.

[DOI]

,

Luca Cristoforetti

,

Walter Kellermann

,

,

Maurizio Omologo

Proceedings of the International Conference on Language Resources and Evaluation, 2008

Localization of multiple speakers based on a two step acoustic map analysis.

[DOI]

,

Maurizio Omologo

,

Piergiorgio Svaizer

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Classification of Acoustic Maps to Determine Speaker Position and Orientation from a Distributed Microphone Network.

[DOI]

,

Maurizio Omologo

,

Piergiorgio Svaizer

,

Christian Zieger

Proceedings of the IEEE International Conference on Acoustics, 2007

A Person Tracking System for CHIL Meetings.

[DOI]

Proceedings of the Multimodal Technologies for Perception of Humans, 2007

2006

Speaker localization based on oriented global coherence field.

[DOI]

,

Maurizio Omologo

,

Piergiorgio Svaizer

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

A Generative Approach to Audio-Visual Person Tracking.

[DOI]

Roberto Brunelli

,

,

Paul Chippendale

,

,

Maurizio Omologo

,

Piergiorgio Svaizer

,

Francesco Tobia

Proceedings of the Multimodal Technologies for Perception of Humans, 2006

2005

Speaker Localization in CHIL Lectures: Evaluation Criteria and Results.

[DOI]

Maurizio Omologo

,

Piergiorgio Svaizer

,

,

Luca Cristoforetti

Proceedings of the Machine Learning for Multimodal Interaction, 2005

Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays.

[DOI]

,

Maurizio Omologo

,

Piergiorgio Svaizer

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the Chil Seminar Corpus.

[DOI]

,

,

,

,

Javier Hernando

,

John W. McDonough

,

Matthias Wölfel

,

,

Maurizio Omologo

,

,

Piergiorgio Svaizer

,

Gerasimos Potamianos

,

Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Loading...