Mathew Magimai-Doss

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Children's Voice Privacy: First Steps and Emerging Challenges.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Unveiling Audio Deepfake Origins: A Deep Metric learning And Conformer Network Approach With Ensemble Fusion.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Speech power spectra: a window into neural oscillations in Parkinson's disease.

[BibT_eX]

[DOI]

Sevada Hovsepyan

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Towards Dynamic Skeleton-based Handshape Subunits for Sign Language Assessment.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics Processing.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Exploring the Complexity of Parkinson's Patient Speech for Depression Detection task: A Qualitative Analysis.

[BibT_eX]

[DOI]

Barbara Ruvolo

Proceedings of the IEEE International Conference on Acoustics, 2025

Automatic Parkinson's disease detection from speech: Layer selection vs adaptation of foundation models.

[BibT_eX]

[DOI]

Barbara Ruvolo

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Emotion information recovery potential of wav2vec2 network fine-tuned for speech recognition task.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2025

2024

On the Quantization of Neural Models for Speaker Verification.

[BibT_eX]

[DOI]

Vishal Kumar

Vinayak Abrol

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Feature Representations for Automatic Meerkat Vocalization Classification.

[BibT_eX]

[DOI]

CoRR, 2024

SSL-TTS: Leveraging Self-Supervised Embeddings and kNN Retrieval for Zero-Shot Multi-speaker TTS.

[BibT_eX]

[DOI]

CoRR, 2024

On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis.

[BibT_eX]

[DOI]

Luis Felipe Parra-Gallego

CoRR, 2024

Cross-transfer Knowledge between Speech and Text Encoders to Evaluate Customer Satisfaction.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Towards interfacing large language models with ASR systems using confidence measures and prompting.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Neurocomputational model of speech recognition for pathological speech detection: a case study on Parkinson's disease speech detection.

[BibT_eX]

[DOI]

Sevada Hovsepyan

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Predicting Heart Activity from Speech using Data-driven and Knowledge-based features.

[BibT_eX]

[DOI]

Gasser Elbanna

Zohreh Mostaani

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Comparing data-Driven and Handcrafted Features for Dimensional Emotion Recognition.

[BibT_eX]

[DOI]

Sargam Vyas

Proceedings of the IEEE International Conference on Acoustics, 2024

Content-Based Objective Evaluation of Artificially Generated Sign Language Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Syllable Level Features for Parkinson's Disease Detection from Speech.

[BibT_eX]

[DOI]

Sevada Hovsepyan

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Implicit phonetic information modeling for speech emotion recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Using Commercial ASR Solutions to Assess Reading Skills in Children: A Case Report.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Towards Learning Emotion Information from Short Segments of Speech.

[BibT_eX]

[DOI]

Sarthak Yadav

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Adjustable deterministic pseudonymization of speech.

[BibT_eX]

[DOI]

Rob J. J. H. van Son

Comput. Speech Lang., 2022

Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track.

[BibT_eX]

[DOI]

CoRR, 2022

Comparing Biosignal and Acoustic feature Representation for Continuous Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the MuSe@MM 2022: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, 2022

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering.

[BibT_eX]

[DOI]

RaviShankar Prasad

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

On Breathing Pattern Information in Synthetic Speech.

[BibT_eX]

[DOI]

Zohreh Mostaani

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Accessible Sign Language Assessment and Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2022

Towards Automatic Prediction of Non-Expert Perceived Speech Fluency Ratings.

[BibT_eX]

[DOI]

Edoardo Moneta

Eleni Theocharopoulos

José Andrés González López

Proceedings of the International Conference on Multimodal Interaction, 2022

Modeling of Pre-Trained Neural Network Embeddings Learned From Raw Waveform for COVID-19 Infection Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

On Joint Optimization of Automatic Speaker Verification and Anti-Spoofing in the Embedding Space.

[BibT_eX]

[DOI]

Alejandro Gómez Alanís

Antonio M. Peinado

IEEE Trans. Inf. Forensics Secur., 2021

Utterance Verification-Based Dysarthric Speech Intelligibility Assessment Using Phonetic Posterior Features.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

Signal-to-signal neural networks for improved spike estimation from calcium imaging data.

[BibT_eX]

[DOI]

PLoS Comput. Biol., 2021

Deep learning architectures for estimating breathing signal and respiratory parameters from speech recordings.

[BibT_eX]

[DOI]

Neural Networks, 2021

Fusion of Acoustic and Linguistic Information using Supervised Autoencoder for Improved Emotion Recognition.

[BibT_eX]

[DOI]

RaviShankar Prasad

Proceedings of the MuSe '21: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, 2021

Late Fusion of the Available Lexicon and Raw Waveform-Based Acoustic Modeling for Depression and Dementia Recognition.

[BibT_eX]

[DOI]

Esaú Villatoro-Tello

Gabriela Ramírez-de-la-Rosa

Petr Motlícek

Juan Camilo Vásquez-Correa

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease.

[BibT_eX]

[DOI]

Elmar Nöth

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Identification of F1 and F2 in Speech Using Modified Zero Frequency Filtering.

[BibT_eX]

[DOI]

RaviShankar Prasad

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Handling Acoustic Variation in Dysarthric Speech Recognition Systems Through Model Combination.

[BibT_eX]

[DOI]

Gabriela Ramírez-de-la-Rosa

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Approximating the Mental Lexicon from Clinical Interviews as a Support Tool for Depression Detection.

[BibT_eX]

[DOI]

Esaú Villatoro-Tello

Daniel Gática-Pérez

Héctor Jiménez-Salazar

Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

On The Relationship Between Speech-Based Breathing Signal Prediction Evaluation Measures and Breathing Parameters Estimation.

[BibT_eX]

[DOI]

Zohreh Mostaani

Aki Härmä

Helmer Strik

Proceedings of the IEEE International Conference on Acoustics, 2021

Phoneme Based Respiratory Analysis of Read Speech.

[BibT_eX]

[DOI]

Aki Härmä

Helmer Strik

Proceedings of the 29th European Signal Processing Conference, 2021

An Objective Evaluation Framework for Pathological Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 14th ITG Conference on Speech Communication, online, September 29, 2021

2020

idiap/torgo_asr: Torgo ASR 1.0.0.

[BibT_eX]

[DOI]

Dataset, October, 2020

An HMM Approach with Inherent Model Selection for Sign Language and Gesture Recognition.

[BibT_eX]

[DOI]

Oya Aran

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Phonology-based Approach for Isolated Sign Production Assessment in Sign Language.

[BibT_eX]

[DOI]

Proceedings of the Companion Publication of the 2020 International Conference on Multimodal Interaction, 2020

Towards Multilingual Sign Language Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Detection Of S1 And S2 Locations In Phonocardiogram Signals Using Zero Frequency Filter.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Dysarthric Speech Recognition with Lattice-Free MMI.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Estimating the Degree of Sleepiness by Integrating Articulatory Feature Knowledge in Raw Waveform Based CNNS.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2019

Subunits Inference and Lexicon Development Based on Pairwise Comparison of Utterances and Signs.

[BibT_eX]

[DOI]

Inf., 2019

Understanding and Visualizing Raw Waveform-Based CNNs.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Using Speech Production Knowledge for Raw Waveform Modelling Based Styrian Dialect Identification.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

HMM-based Approaches to Model Multichannel Information in Sign Language Inspired from Articulatory Features-based Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Voice Source Related Information for Depression Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Segment-level Training of ANNs Based on Acoustic Confidence Measures for Hybrid HMM/ANN Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Improving Children Speech Recognition through Feature Learning from Raw Speech Signal.

[BibT_eX]

[DOI]

Selen Hande Kabil

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Towards weakly supervised acoustic subword unit discovery and lexicon development using hidden Markov models.

[BibT_eX]

[DOI]

Speech Commun., 2018

SMILE Swiss German Sign Language Dataset.

[BibT_eX]

[DOI]

Sandra Sidler-Miserez

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Implementing Fusion Techniques for the Classification of Paralinguistic Information.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech.

[BibT_eX]

[DOI]

Shrikanth S. Narayanan

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

On Learning Vocal Tract System Related Speaker Discriminative Information from Raw Signal Using CNNs.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

On Learning to Identify Genders from Raw Speech Signal Using CNNs.

[BibT_eX]

[DOI]

Selen Hande Kabil

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Towards Directly Modeling Raw Speech Signal for Speaker Verification Using CNNS.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Long-Term Spectral Statistics for Voice Presentation Attack Detection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Posterior-Based Multistream Formulation for G2P Conversion.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2017

End-to-End convolutional neural network-based voice presentation attack detection.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Joint Conference on Biometrics, 2017

2016

Acoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework.

[BibT_eX]

[DOI]

Speech Commun., 2016

Articulatory feature based continuous speech recognition using probabilistic lexical modeling.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2016

Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

HMM-Based Non-Native Accent Assessment Using Posterior Features.

[BibT_eX]

[DOI]

Milos Cernak

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Presentation Attack Detection Using Long-Term Spectral Statistics for Trustworthy Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference of the Biometrics Special Interest Group, 2016

2015

Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model.

[BibT_eX]

[DOI]

Speech Commun., 2015

Learning linearly separable features for speech recognition using convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Conference on Learning Representations, 2015

Objective intelligibility assessment of text-to-speech systems through utterance verification.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Automatic accentedness evaluation of non-native speech using phonetic and sub-phonetic posterior probabilities.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Analysis of CNN-based speech recognition system using raw speech as input.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Objective speech intelligibility assessment through comparison of phoneme class conditional probability sequences.

[BibT_eX]

[DOI]

Raphael Ullmann

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

An HMM-based formalism for automatic subword unit derivation and pronunciation generation.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Integrated pronunciation learning for automatic speech recognition using probabilistic lexical modeling.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Convolutional Neural Networks-based continuous speech recognition using raw speech signal.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Feature mapping of multiple beamformed sources for robust overlapping speech recognition using a microphone array.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2014

On recognition of non-native speech using probabilistic lexical model.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

On modeling context-dependent clustered states: Comparing HMM/GMM, hybrid HMM/ANN and KL-HMM approaches.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Joint phoneme segmentation inference and classification using CRFs.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

2013

Applying Multi- and Cross-Lingual Stochastic Phone Space Transformations to Non-Native Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

A Savitzky-Golay Filtering Perspective of Dynamic Feature Computation.

[BibT_eX]

[DOI]

Sunder Ram Krishnan

Chandra Sekhar Seelamantula

IEEE Signal Process. Lett., 2013

End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2013

Improving grapheme-based ASR by probabilistic lexical modeling approach.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Grapheme and multilingual posterior features for under-resourced speech recognition: A study on Scottish Gaelic.

[BibT_eX]

[DOI]

Peter Bell

Proceedings of the IEEE International Conference on Acoustics, 2013

A probabilistic framework for multiple speaker localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Probabilistic lexical modeling and unsupervised training for zero-resourced ASR.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

A Fast Parts-Based Approach to Speaker Verification Using Boosted Slice Classifiers.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2012

Phase AutoCorrelation (PAC) features for noise robust speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2012

Boosting localized binary features for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2012 Symposium on Machine Learning in Speech and Language Processing, 2012

Combination of Sparse Classification and Multilayer Perceptron for Noise-robust ASR.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Using Sparse Classification Outputs as Feature Observations for Noise-robust ASR.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Template-based ASR using posterior features and synthetic references: comparing different TTS systems.

[BibT_eX]

[DOI]

Serena Soldo

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Synthetic References for Template-based ASR using posterior features.

[BibT_eX]

[DOI]

Serena Soldo

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Joint detection and localization of multiple speakers using a probabilistic interpretation of the steered response power.

[BibT_eX]

[DOI]

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Acoustic data-driven grapheme-to-phoneme conversion using KL-HMM.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A TDOA Gaussian mixture model for improving acoustic source tracking.

[BibT_eX]

[DOI]

Proceedings of the 20th European Signal Processing Conference, 2012

2011

Transcribing Mandarin Broadcast Speech Using Multi-Layer Perceptron Acoustic Features.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2011

Analysis of MLP-Based Hierarchical Phoneme Posterior Probability Estimator.

[BibT_eX]

[DOI]

Garimella S. V. S. Sivaram

Hynek Hermansky

IEEE Trans. Speech Audio Process., 2011

Privacy-Sensitive Audio Features for Speech/Nonspeech Detection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2011

Analysis and Comparison of Recent MLP Features for LVCSR Systems.

[BibT_eX]

[DOI]

Fabio Valente

Wen Wang

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Hierarchical Tandem Features for ASR in Mandarin.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Grapheme-Based Automatic Speech Recognition Using KL-HMM.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Improving Non-Native ASR Through Stochastic Multilingual Phoneme Space Transformations.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Fast speaker verification on mobile phone data using boosted slice classifiers.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Joint Conference on Biometrics, 2011

Posterior features for template-based ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Phoneme recognition using Boosted Binary Features.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Integrating articulatory features using Kullback-Leibler divergence based acoustic model for phoneme recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Language dependent universal phoneme posterior estimation for mixed language speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Improving Articulatory Feature and Phoneme Recognition Using Multitask Learning.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2011, 2011

Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition.

[BibT_eX]

[DOI]

David Imseng

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

A comparative large scale study of MLP features for Mandarin ASR.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Hierarchical multilayer perceptron based language identification.

[BibT_eX]

[DOI]

David Imseng

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Towards mixed language speech recognition systems.

[BibT_eX]

[DOI]

David Imseng

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Boosted binary features for noise-robust speaker verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Evaluating the robustness of privacy-sensitive audio features for speech detection in personal audio log scenarios.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Hierarchical processing of the modulation spectrum for GALE Mandarin LVCSR system.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Investigating privacy-sensitive features for speech detection in multiparty conversations.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Speaker change detection with privacy-preserving audio cues.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Multimodal Interfaces, 2009

Volterra series for analyzing MLP based phoneme posterior estimator.

[BibT_eX]

[DOI]

Garimella S. V. S. Sivaram

Hynek Hermansky

Proceedings of the IEEE International Conference on Acoustics, 2009

Non-linear mapping for multi-channel speech separation and robust overlapping spech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Posterior features applied to speech recognition tasks with user-defined vocabulary.

[BibT_eX]

[DOI]

Guillermo Aradilla

Proceedings of the IEEE International Conference on Acoustics, 2009

MLP based hierarchical system for task adaptation in ASR.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

A Neural Network Based Regression Approach for Recognizing Simultaneous Speech.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 5th International Workshop, 2008

Neural network based regression for robust overlapping speech recognition using microphone arrays.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Using KL-based acoustic models in a large vocabulary recognition task.

[BibT_eX]

[DOI]

Guillermo Aradilla

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Exploiting contextual information for improved phoneme recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Using comparison of parallel phoneme probability streams for OOV word detection.

[BibT_eX]

[DOI]

Tamara Tosic

Hynek Hermansky

Proceedings of the 2008 16th European Signal Processing Conference, 2008

MLP-based log spectral energy mapping for robust overlapping speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2008 16th European Signal Processing Conference, 2008

2007

A Study of Phoneme and Grapheme Based Context-Dependent ASR Systems.

[BibT_eX]

[DOI]

John Dines

Proceedings of the Machine Learning for Multimodal Interaction , 2007

Improving speech translation with automatic boundary prediction.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Cross-linguistic analysis of prosodic features for sentence segmentation.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Articulatory feature classifiers trained on 2000 hours of telephone speech.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Entropy Based Classifier Combination for Sentence Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop.

[BibT_eX]

[DOI]

Karen Livescu

Özgür Çetin

Mark Hasegawa-Johnson

Stephen Dawson-Haggerty

Proceedings of the IEEE International Conference on Acoustics, 2007

Manual Transcription of Conversational Speech at the Articulatory Feature Level.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

A Generalized Dynamic Composition Algorithm of Weighted Finite State Transducers for Large Vocabulary Speech Recognition.

[BibT_eX]

[DOI]

Octavian Cheng

John Dines

Proceedings of the IEEE International Conference on Acoustics, 2007

An Articulatory Feature-Based Tandem Approach and Factored Observation Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System.

[BibT_eX]

[DOI]

Proceedings of the Multimodal Technologies for Perception of Humans, 2007

Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

Juicer: A Weighted Finite-State Transducer Speech Decoder.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2006

Threshold Selection for Unsupervised Detection, With an Application to Microphone Arrays.

[BibT_eX]

[DOI]

Guillaume Lathoud

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Using auxiliary sources of knowledge for automatic speech recognition.

[BibT_eX]

[DOI]

PhD thesis, 2005

A spectrogram model for enhanced source localization and noise-robust ASR.

[BibT_eX]

[DOI]

Guillaume Lathoud

Bertrand Mesot

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A sector-based, frequency-domain approach to detection and localization of multiple speakers.

[BibT_eX]

[DOI]

Guillaume Lathoud

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

HMM/ANN Based Spectral Peak Location Estimation for Noise Robust Speech Recognition.

[BibT_eX]

[DOI]

Shajith Ikbal

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Speech recognition with auxiliary information.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2004

On the Adequacy of Baseform Pronunciations and Pronunciation Variants.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2004

Modeling auxiliary features in tandem systems.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Spectro-temporal activity pattern (STAP) features for noise robust ASR.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Joint decoding for phoneme-grapheme continuous speech recognition.

[BibT_eX]

[DOI]

Samy Bengio

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Enhancement of speech in multispeaker environment.

[BibT_eX]

[DOI]

B. Yegnanarayana

S. R. Mahadeva Prasanna

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Using pitch frequency information in speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Speech recognition of spontaneous, noisy speech using auxiliary information in Bayesian networks.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, 2002

Auxiliary variables in conditional Gaussian mixtures for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Pattern Recognition, 2002

2001

Modeling auxiliary information in Bayesian network based ASR.

[BibT_eX]

[DOI]