Vikramjit Mitra

Jingping Nie

Erdrin Azemi

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Pre-Trained Model Representations and Their Robustness Against Noise for Speech Emotion Analysis.

[BibT_eX]

[DOI]

Vasudha Kowtha

Hsiang-Yun Sherry Chien

Erdrin Azemi

Carlos Avendaño

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation.

[BibT_eX]

[DOI]

Hsiang-Yun Sherry Chien

Vasudha Kowtha

Joseph Yitan Cheng

Erdrin Azemi

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech.

[BibT_eX]

[DOI]

Panayiotis G. Georgiou

Sachin Kajarekar

Jeffrey P. Bigham

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

SEP-28k: A Dataset for Stuttering Event Detection from Podcasts with People Who Stutter.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Estimating Respiratory Rate From Breath Audio Obtained Through Wearable Microphones.

[BibT_eX]

[DOI]

Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

2020

Investigation and analysis of hyper and hypo neuron pruning to selectively update neurons during unsupervised adaptation.

[BibT_eX]

[DOI]

Digit. Signal Process., 2020

Detecting Emotion Primitives from Speech and Their Use in Discerning Categorical Emotions.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2019

Multi-Modal Learning for Speech Emotion Recognition: An Analysis and Comparison of ASR Outputs with Ground Truth Transcription.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Articulatory Features for ASR of Pathological Speech.

[BibT_eX]

[DOI]

CoRR, 2018

Articulatory Features for ASR of Pathological Speech.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Noise Robust Acoustic to Articulatory Speech Inversion.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Interpreting DNN Output Layer Activations: A Strategy to Cope with Unseen Data in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Articulatory Information and Multiview Features for Large Vocabulary Continuous Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2017

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2017

Leveraging Deep Neural Network Activation Entropy to cope with Unseen Data in Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2017

Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Speech recognition in unseen and noisy channel conditions.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Tackling unseen acoustic conditions in query-by-example search using time and frequency convolution for multilingual deep bottleneck features.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Robust Features in Deep-Learning-Based Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Toward human-assisted lexical unit discovery without text resources.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech Inversion.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen Nets.

[BibT_eX]

[DOI]

Dimitra Vergyri

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Automatic Speech Transcription for Low-Resource Languages - The Case of Yoloxóchitl Mixtec (Mexico).

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Coping with Unseen Data Conditions: Investigating Neural Net Architectures, Robust Features, and Information Fusion for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation.

[BibT_eX]

[DOI]

Martin Graciarena

Luciana Ferrer

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Noise and reverberation effects on depression detection from speech.

[BibT_eX]

[DOI]

Andreas Tsiartas

Elizabeth Shriberg

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A phonetically aware system for speech activity detection.

[BibT_eX]

[DOI]

Luciana Ferrer

Martin Graciarena

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Speech-based assessment of PTSD in a military population using diverse feature classes.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Analysis of coarticulated speech using estimated articulatory trajectories.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Combating reverberation in large vocabulary continuous speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Cross-corpus depression prediction from speech.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Effects of feature type, learning algorithm and speaking style for depression detection from speech.

[BibT_eX]

[DOI]

Elizabeth Shriberg

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improving robustness against reverberation for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Time-frequency convolutional networks for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Deep convolutional nets and robust features for reverberation-robust speech recognition.

[BibT_eX]

[DOI]

Wen Wang

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

The SRI AVEC-2014 Evaluation System.

[BibT_eX]

[DOI]

Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014

Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Recent improvements in SRI's keyword detection system for noisy audio.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Highly accurate phonetic segmentation using boundary correction models and system fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Articulatory features from deep neural networks and their role in speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Feature fusion for high-accuracy keyword spotting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Medium-duration modulation cepstral feature for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Calibration and multiple system fusion for spoken term detection using linear logistic regression.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Automatic phonetic segmentation using boundary models.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Modulation features for noise robust speaker identification.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Damped oscillator cepstral coefficients for robust speech recognition.

[BibT_eX]

[DOI]

Martin Graciarena

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Strategies for high accuracy keyword detection in noisy channels.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improving language identification robustness to highly channel-degraded speech through multiple system fusion.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

All for one: feature combination for highly channel-degraded speech activity detection.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A noise-robust system for NIST 2012 speaker recognition evaluation.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Articulatory trajectories for large-vocabulary speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Using multiple versions of speech input in phone recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Normalized amplitude modulation features for large vocabulary noise-robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Articulatory Information for Noise Robust Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2011

Speech inversion: Benefits of tract variables over pellet trajectories.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Gesture-based Dynamic Bayesian Network for noise robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Robust speech recognition using articulatory gestures in a Dynamic Bayesian Network framework.

[BibT_eX]

[DOI]

Hosung Nam

Carol Y. Espy-Wilson

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2010

A procedure for estimating gestural scores from natural speech.

[BibT_eX]

[DOI]

Mark Hasegawa-Johnson

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Robust word recognition using articulatory trajectories and gestures.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009

Noise robustness of tract variables and their application to speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A noise-type and level-dependent MPO-based speech enhancement architecture with variable frame analysis for noise-robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

From acoustics to Vocal Tract time functions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

Content based audio classification: a neural network approach.

[BibT_eX]

[DOI]

Soft Comput., 2008

Language and genre detection in audio content analysis.

[BibT_eX]

[DOI]

Daniel Garcia-Romero

Carol Y. Espy-Wilson

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Language detection in audio content analysis.

[BibT_eX]

[DOI]

Daniel Garcia-Romero

Carol Y. Espy-Wilson

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Text classification: A least square support vector machine approach.

[BibT_eX]

[DOI]

Satarupa Banerjee

Appl. Soft Comput., 2007

A Neural Network based Audio Content Classification.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2007

2006

Lidar detection of underwater objects using a neuro-SVM-based architecture.

[BibT_eX]

[DOI]

Satarupa Banerjee

IEEE Trans. Neural Networks, 2006

Prior-shape-based segmentation of various objects in ultrasound images after speckle-reduction using level-set based curvature evolution.

[BibT_eX]

[DOI]

Proceedings of the Medical Imaging 2006: Image Processing, 2006

2005

Lidar Signal Processing for Under-Water Object Detection.

[BibT_eX]

[DOI]