Milos Cernak

CoRR, March, 2026

Differentiable Time-Varying IIR Filtering for Real-Time Speech Denoising.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

Shortcut Flow Matching for Speech Enhancement: Step-Invariant flows via single stage training.

[BibT_eX]

[DOI]

Naisong Zhou

Saisamarth Rajesh Phaye

CoRR, September, 2025

DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Model as Loss: A Self-Consistent Training Paradigm.

[BibT_eX]

[DOI]

Saisamarth Rajesh Phaye

Andrew Harper

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OpenACE: An Open Benchmark for Evaluating Audio Coding Performance.

[BibT_eX]

[DOI]

Jozef Coldenhoff

Niclas Granqvist

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Semi-intrusive audio evaluation: Casting non-intrusive assessment as a multi-modal text prediction task.

[BibT_eX]

[DOI]

Jozef Coldenhoff

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Diffusion-based Speech Enhancement with Schrödinger Bridge and Symmetric Noise Schedule.

[BibT_eX]

[DOI]

CoRR, 2024

On Real-Time Multi-Stage Speech Enhancement Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Multi-Channel Mosra: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and A Teacher Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Cluster-based pruning techniques for audio data.

[BibT_eX]

[DOI]

CoRR, 2023

Demo Abstract: In-Ear-Voice - Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms.

[BibT_eX]

[DOI]

Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation, 2023

In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms.

[BibT_eX]

[DOI]

Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation, 2023

ALO-VC: Any-to-any Low-latency One-shot Voice Conversion.

[BibT_eX]

[DOI]

Bohan Wang

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Speaker Embeddings as Individuality Proxy for Voice Stress Detection.

[BibT_eX]

[DOI]

Zihan Wu

Karl El Hajal

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Personalized Task Load Prediction in Speech Communication.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Efficient Speech Quality Assessment Using Self-Supervised Framewise Embeddings.

[BibT_eX]

[DOI]

Karl El Hajal

Zihan Wu

Gasser Elbanna

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

BC-VAD: A Robust Bone Conduction Voice Activity Detection.

[BibT_eX]

[DOI]

Niccolò Polvani

CoRR, 2022

Fast accuracy estimation of deep learning based multi-class musical source separation.

[BibT_eX]

[DOI]

Alexandru Mocanu

Benjamin Ricaud

Proceedings of the 2022 Northern Lights Deep Learning Workshop, 2022

Application for Real-time Personalized Speaker Extraction.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment.

[BibT_eX]

[DOI]

Karl El Hajal

Pablo Mainar

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load.

[BibT_eX]

[DOI]

Gasser Elbanna

Alice Biryukov

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition.

[BibT_eX]

[DOI]

Boris Bergsma

Minhao Yang

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SERAB: A Multi-Lingual Benchmark for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Mikolaj Kegler

Pierre Beckmann

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Power efficient analog features for audio recognition.

[BibT_eX]

[DOI]

Boris Bergsma

Minhao Yang

CoRR, 2021

A Universal Deep Room Acoustics Estimator.

[BibT_eX]

[DOI]

Paula Sánchez López

Paul Callens

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Non-Intrusive Speech Quality Assessment with Transfer Learning and Subject-Specific Scaling.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping.

[BibT_eX]

[DOI]

Gasser Elbanna

Proceedings of the HEAR: Holistic Evaluation of Audio Representations, 2021

Word-Level Embeddings for Cross-Task Transfer Learning in Speech Processing.

[BibT_eX]

[DOI]

Pierre Beckmann

Mikolaj Kegler

Proceedings of the 29th European Signal Processing Conference, 2021

AC-VC: Non-Parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Joint Blind Room Acoustic Characterization From Speech And Music Signals Using Convolutional Recurrent Neural Networks.

[BibT_eX]

[DOI]

Paul Callens

CoRR, 2020

Deep Speech Inpainting of Time-Frequency Masks.

[BibT_eX]

[DOI]

Mikolaj Kegler

Pierre Beckmann

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spiking Neural Networks Trained With Backpropagation for Low Power Neuromorphic Implementation of Voice Activity Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Bin Encoding Training of a Spiking Neural Network Based Voice Activity Detection.

[BibT_eX]

[DOI]

Giorgia Dellaferrera

Flavio Martinelli

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

FastVC: Fast Voice Conversion with non-parallel data.

[BibT_eX]

[DOI]

Oriol Barbany Mayor

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019

Voice Presentation Attack Detection Using Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Biometric Anti-Spoofing, 2019

Speech-VGG: A deep feature extractor for speech processing.

[BibT_eX]

[DOI]

CoRR, 2019

End-to-End Accented Speech Recognition.

[BibT_eX]

[DOI]

Thibault Viglino

Petr Motlícek

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Open-Vocabulary Keyword Spotting with Audio and Text Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Evaluating Audiovisual Source Separation in the Context of Video Conferencing.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Phone-Attribute Posteriors to Evaluate the Speech of Cochlear Implant Users.

[BibT_eX]

[DOI]

Tomás Arias-Vergara

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Cognitive Speech Coding: Examining the Impact of Cognitive Speech Processing on Speech Compression.

[BibT_eX]

[DOI]

Alexandre Hyafil

IEEE Signal Process. Mag., 2018

NeuroSpeech.

[BibT_eX]

[DOI]

Jesús Francisco Vargas-Bonilla

SoftwareX, 2018

NeuroSpeech: An open-source software for Parkinson's speech analysis.

[BibT_eX]

[DOI]

Jesús Francisco Vargas-Bonilla

Digit. Signal Process., 2018

Phonological Posteriors and GRU Recurrent Units to Assess Speech Impairments of Patients with Parkinson's Disease.

[BibT_eX]

[DOI]

Nicanor García-Ospina

Elmar Nöth

Proceedings of the Text, Speech, and Dialogue - 21st International Conference, 2018

Phonological i-Vectors to Detect Parkinson's Disease.

[BibT_eX]

[DOI]

Nicanor García-Ospina

Tomás Arias-Vergara

Elmar Nöth

Proceedings of the Text, Speech, and Dialogue - 21st International Conference, 2018

Nasal Speech Sounds Detection Using Connectionist Temporal Classification.

[BibT_eX]

[DOI]

Sibo Tong

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Perceptual Information Loss due to Impaired Speech Production.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Characterisation of voice quality of Parkinson's disease using differential phonological posterior features.

[BibT_eX]

[DOI]

Frank Rudzicz

Heidi Christensen

Elmar Nöth

Comput. Speech Lang., 2017

Speech vocoding for laboratory phonology.

[BibT_eX]

[DOI]

Stefan Benus

Alexandros Lazaridis

Comput. Speech Lang., 2017

Bob Speaks Kaldi.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multi-view representation learning via gcca for multimodal analysis of Parkinson's disease.

[BibT_eX]

[DOI]

Phani Sankar Nidadavolu

Maria Yancheva

Alyssa Vann

Nikolai Vogler

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

On the impact of non-modal phonation on phonological features.

[BibT_eX]

[DOI]

Phani Sankar Nidadavolu

Maria Yancheva

Alyssa Vann

Nikolai Vogler

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

On structured sparsity of phonological posteriors for linguistic parsing.

[BibT_eX]

[DOI]

Speech Commun., 2016

An Analysis of Rhythmic Staccato-Vocalization Based on Frequency Demodulation for Laughter Detection in Conversational Meetings.

[BibT_eX]

[DOI]

CoRR, 2016

Investigating Spectral Amplitude Modulation Phase Hierarchy Features in Speech Synthesis.

[BibT_eX]

[DOI]

Alexandros Lazaridis

Pierre-Edouard Honnet

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

HMM-Based Non-Native Accent Assessment Using Posterior Features.

[BibT_eX]

[DOI]

Ramya Rasipuram

Mathew Magimai-Doss

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody.

[BibT_eX]

[DOI]

Alexandros Lazaridis

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

PhonVoc: A Phonetic and Phonological Vocoding Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Sound Pattern Matching for Automatic Prosodic Event Detection.

[BibT_eX]

[DOI]

Pierre-Edouard Honnet

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Modeling unvoiced sounds in statistical parametric speech synthesis with a continuous vocoder.

[BibT_eX]

[DOI]

Proceedings of the 24th European Signal Processing Conference, 2016

2015

Incremental Syllable-Context Phonetic Vocoding.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2015

Residual-Based Excitation with Continuous F0 Modeling in HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

Tamás Gábor Csapó

Géza Németh

Proceedings of the Statistical Language and Speech Processing, 2015

Automatic accentedness evaluation of non-native speech using phonetic and sub-phonetic posterior probabilities.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Neuromorphic based oscillatory device for incremental syllable boundary detection.

[BibT_eX]

[DOI]

Alexandre Hyafil

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

An empirical model of emphatic word detection.

[BibT_eX]

[DOI]

Pierre-Edouard Honnet

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

On compressibility of neural network phonological features for low bit rate speech coding.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Phonological vocoding using artificial neural networks.

[BibT_eX]

[DOI]

Blaise Potard

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Development of bilingual ASR system for MediaParl corpus.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

A Simple Continuous Pitch Estimation Algorithm.

[BibT_eX]

[DOI]

Petr Motlícek

IEEE Signal Process. Lett., 2013

Syllable-based pitch encoding for low bit rate speech coding with recognition/synthesis architecture.

[BibT_eX]

[DOI]

Xingyu Na

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

On the (UN)importance of the contextual factors in HMM-based speech synthesis and coding.

[BibT_eX]

[DOI]

Petr Motlícek

Proceedings of the IEEE International Conference on Acoustics, 2013

Automatic Staging of Audio with Emotions.

[BibT_eX]

[DOI]

Lakshmi Babu Saheer

Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013

2012

Reading companion: the technical and social design of an automated reading tutor.

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on Child, Computer and Interaction, 2012

Robust triphone mapping for acoustic modeling.

[BibT_eX]

[DOI]

David Imseng

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

Effective Triphone Mapping for Acoustic Modeling in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010

A Comparison of Decision Tree Classifiers for Automatic Diagnosis of Speech Recognition Errors.

[BibT_eX]

[DOI]

Comput. Informatics, 2010

Diagnostics for Debugging Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 13th International Conference, 2010

2006

Unit Selection Speech Synthesis in Noise.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Diagnostics of speech recognition using classification phoneme diagnostic trees.

[BibT_eX]

Christian Wellekens

Proceedings of the Second IASTED International Conference on Computational Intelligence, 2006

2005

TTSBOX: a MATLAB toolbox for teaching text-to-speech synthesis.

[BibT_eX]

[DOI]

Thierry Dutoit