Mirco Ravanelli

CoRR, April, 2026

LL-SDR: Low-Latency Speech enhancement through Discrete Representations.

[BibT_eX]

[DOI]

CoRR, March, 2026

Listen First, Then Answer: Timestamp-Grounded Speech Reasoning.

[BibT_eX]

[DOI]

CoRR, March, 2026

WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation.

[BibT_eX]

[DOI]

CoRR, March, 2026

Beyond Fixed Frames: Dynamic Character-Aligned Speech Tokenization.

[BibT_eX]

[DOI]

Étienne de Villers-Sidani

CoRR, January, 2026

Toward Faithful Explanations in Acoustic Anomaly Detection.

[BibT_eX]

[DOI]

CoRR, January, 2026

DASB - Discrete Audio and Speech Benchmark.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

Bayesian Deep Learning for Remaining Useful Life Estimation via Stein Variational Gradient Descent.

[BibT_eX]

[DOI]

IEEE Trans Autom. Sci. Eng., 2026

2025

Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease.

[BibT_eX]

[DOI]

Denise Klein

CoRR, October, 2025

Investigating Faithfulness in Large Audio Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

Virtual Consistency for Audio Editing.

[BibT_eX]

[DOI]

CoRR, September, 2025

FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation.

[BibT_eX]

[DOI]

CoRR, September, 2025

From Black Box to Biomarker: Sparse Autoencoders for Interpreting Speech Models of Parkinson's Disease.

[BibT_eX]

[DOI]

CoRR, July, 2025

Autoregressive Speech Enhancement via Acoustic Tokens.

[BibT_eX]

[DOI]

Giuseppe Alessio D'Inverno

CoRR, July, 2025

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, May, 2025

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks.

[BibT_eX]

[DOI]

CoRR, February, 2025

Discrete Audio Tokens: More Than a Survey!

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Generalization limits of Graph Neural Networks in identity effects learning.

[BibT_eX]

[DOI]

Simone Brugiapaglia

Neural Networks, 2025

A protocol for trustworthy EEG decoding with neural networks.

[BibT_eX]

[DOI]

Davide Borra

Elisa Magosso

Neural Networks, 2025

Speech self-supervised representations benchmarking: A case for larger probing heads.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2025

Does Language Matter for Early Detection of Parkinson's Disease from Speech?

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Workshop on Machine Learning for Signal Processing, 2025

Audio Prototypical Network for Controllable Music Recommendation.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Workshop on Machine Learning for Signal Processing, 2025

Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2025

LMAC-TD: Producing Time Domain Explanations for Audio Classifiers.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

What Are They Doing? Joint Audio-Speech Co-Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

CL-MASR: A Continual Learning Benchmark for Multilingual ASR.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Open-Source Conversational AI with SpeechBrain 1.0.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2024

Open-Source Conversational AI with SpeechBrain 1.0.

[BibT_eX]

[DOI]

CoRR, 2024

DASB - Discrete Audio and Speech Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

Are LLMs Robust for Spoken Dialogues?

[BibT_eX]

[DOI]

CoRR, 2024

SpeechBrain-MOABB: An open-source Python library for benchmarking deep neural networks applied to EEG signals.

[BibT_eX]

[DOI]

Davide Borra

Francesco Paissan

Comput. Biol. Medicine, 2024

Progres: Prompted Generative Rescoring on ASR N-Best.

[BibT_eX]

[DOI]

Ada Defne Tur

Adel Moumen

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Listenable Maps for Zero-Shot Audio Classifiers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

Audio Editing with Non-Rigid Text Prompts.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

How Should We Extract Discrete Audio Tokens from Self-Supervised Models?

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Listenable Maps for Audio Classifiers.

[BibT_eX]

[DOI]

Francesco Paissan

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Skill: Similarity-Aware Knowledge Distillation for Speech Self-Supervised Learning.

[BibT_eX]

[DOI]

Luca Zampierin

Ghouthi Boukli Hacene

Bac Nguyen

Proceedings of the IEEE International Conference on Acoustics, 2024

Resource-Efficient Separation Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Focal Modulation Networks for Interpretable Sound Classification.

[BibT_eX]

[DOI]

Isaac Neri Gomez-Sarmiento

Proceedings of the IEEE International Conference on Acoustics, 2024

Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

TARIC-SLU: A Tunisian Benchmark Dataset for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming.

[BibT_eX]

[DOI]

Shubham Gupta

Faez Amjed Mezdari

Proceedings of the Artificial Neural Networks in Pattern Recognition, 2024

Explaining Network Decision Provides Insights on the Causal Interaction Between Brain Regions in a Motor Imagery Task.

[BibT_eX]

[DOI]

Davide Borra

Proceedings of the Artificial Neural Networks in Pattern Recognition, 2024

Multi-modal Decoding of Reach-to-Grasping from EEG and EMG via Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks in Pattern Recognition, 2024

2023

Exploring Self-Attention Mechanisms for Speech Separation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.

[BibT_eX]

[DOI]

CoRR, 2023

Audio Editing with Non-Rigid Text Prompts.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets.

[BibT_eX]

[DOI]

CoRR, 2023

Speech Emotion Diarization: Which Emotion Appears When?

[BibT_eX]

[DOI]

CoRR, 2023

Posthoc Interpretation via Quantization.

[BibT_eX]

[DOI]

Francesco Paissan

CoRR, 2023

Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Fine-Tuning Strategies for Faster Inference Using Speech Self-Supervised Models: A Comparative Study.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Simulated Annealing in Early Layers Leads to Better Generalization.

[BibT_eX]

[DOI]

AmirMohammad Sarfi

Zahra Karimpour

Muawiz Chaudhary

Nasir Mohammad Khalid

Sudhir P. Mudur

Eugene Belilovsky

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Speech Emotion Diarization: Which Emotion Appears When?

[BibT_eX]

[DOI]

Yingzhi Wang

Alya Yacoubi

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Rescuespeech: A German Corpus for Speech Recognition in Search and Rescue Domain.

[BibT_eX]

[DOI]

Sangeet Sagar

Bernd Kiefer

Ivana Kruijff-Korbayová

Josef van Genabith

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Learning Representations for New Sound Classes With Continual Self-Supervised Learning.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

Resource-Efficient Separation Transformer.

[BibT_eX]

[DOI]

CoRR, 2022

On Using Transformers for Speech-Separation.

[BibT_eX]

[DOI]

CoRR, 2022

OSSEM: one-shot speaker adaptive speech enhancement using meta learning.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation.

[BibT_eX]

[DOI]

Artem Ploujnikov

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Real-M: Towards Speech Separation on Real Mixtures.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

SpeechBrain: A General-Purpose Speech Toolkit.

[BibT_eX]

[DOI]

CoRR, 2021

Transformers with Competitive Ensembles of Independent Mechanisms.

[BibT_eX]

[DOI]

CoRR, 2021

Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

The Energy and Carbon Footprint of Training End-to-End Speech Recognizers.

[BibT_eX]

[DOI]

Titouan Parcollet

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

ECAPA-TDNN Embeddings for Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Attention Is All You Need In Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity.

[BibT_eX]

[DOI]

Juan Manuel Mayor Torres

Sara E. Medina-DeVilliers

Matthew D. Lerner

Giuseppe Riccardi

Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

2020

BIRD: Big Impulse Response Dataset.

[BibT_eX]

[DOI]

CoRR, 2020

Towards Unsupervised Learning of Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Quaternion Neural Networks for Multi-Channel Distant Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multi-Task Self-Supervised Learning for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Using Speech Synthesis to Train End-To-End Spoken Language Understanding Models.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Learning Speaker Representations with Mutual Information.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Learning Problem-Agnostic Speech Representations from Multiple Self-Supervised Tasks.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speech Model Pre-Training for End-to-End Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Quaternion Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

The Pytorch-kaldi Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Titouan Parcollet

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Light Gated Recurrent Units for Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Emerg. Top. Comput. Intell., 2018

Automatic context window composition for distant speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2018

Speech and Speaker Recognition from Raw Waveform with SincNet.

[BibT_eX]

[DOI]

CoRR, 2018

Interpretable Convolutional Filters with SincNet.

[BibT_eX]

[DOI]

CoRR, 2018

Speech recognition with quaternion neural networks.

[BibT_eX]

[DOI]

CoRR, 2018

Speaker Recognition from Raw Waveform with SincNet.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Twin Regularization for Online Speech Recognition.

[BibT_eX]

[DOI]

Dmitriy Serdyuk

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Deep Learning for Distant Speech Recognition.

[BibT_eX]

[DOI]

PhD thesis, 2017

Deep Learning for Distant Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2017

The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments.

[BibT_eX]

[DOI]

CoRR, 2017

Improving Speech Recognition by Revising Gated Recurrent Units.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A network of deep neural networks for Distant Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Batch-normalized joint training for DNN-based distant speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Discussion.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Realistic Multi-Microphone Data Simulation for Distant Speech Recognition.

[BibT_eX]

[DOI]

Piergiorgio Svaizer

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Insights into Audio-Based Multimedia Event Classification with Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions, 2015

Contaminated speech training methods for robust DNN-HMM distant speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A multi-channel corpus for distant-speech interaction in presence of known interferences.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

The DIRHA simulated corpus.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

TANDEM-bottleneck feature combination using hierarchical Deep Neural Networks.

[BibT_eX]

[DOI]

Van Hai Do

Adam Janin

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

On the selection of the impulse responses for distant-speech recognition based on contaminated speech training.

[BibT_eX]

[DOI]

Ramón Fernandez Astudillo

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones.

[BibT_eX]

[DOI]

Marco Matassoni

Athanasios Katsamanis

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Audio-concept features and hidden Markov models for multimedia event detection.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia, 2014

A speech event detection and localization task for multiroom environments.

[BibT_eX]

[DOI]

Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Audio concept classification with Hierarchical Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 22nd European Signal Processing Conference, 2014

2013

Embedding speech recognition to control lights.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Audio Concept Ranking for Video Event Detection on User-Generated Content.

[BibT_eX]

[DOI]

Benjamin Elizalde