Michael L. Seltzer

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

End-to-End Speech Recognition Contextualization with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Augmenting text for spoken language understanding with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving fast-slow Encoder based Transducer with Streaming Deliberation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Streaming parallel transducer beam search with fast slow cascaded encoders.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Deliberation Model for On-Device Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Neural-FST Class Language Model for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios.

[BibT_eX]

[DOI]

CoRR, 2021

Streaming Attention-Based Models with Augmented Memory for End-To-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Alignment Restricted Streaming Recurrent Neural Network Transducer.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Deep Shallow Fusion for RNN-T Personalization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Collaborative Training of Acoustic Encoders for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Flexi-Transducer: Optimizing Latency, Accuracy and Compute for Multi-Domain On-Device Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Memory-Efficient Speech Recognition on Smart Devices.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Weak-Attention Suppression for Transformer Based Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transformer-Based Acoustic Modeling for Hybrid Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Aipnet: Generative Adversarial Pre-Training of Accent-Invariant Networks for End-To-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2019

Introduction to the Issue on Far-Field Speech Processing in the Era of Deep Learning: Speech Enhancement, Separation, and Recognition.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2019

RNN-T For Latency Controlled ASR With Improved Beam Search.

[BibT_eX]

[DOI]

CoRR, 2019

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention.

[BibT_eX]

[DOI]

CoRR, 2019

Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Improved Training for Online End-to-end Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Towards Language-Universal End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Suyoun Kim

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-Channel Far-Field Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Toward Human Parity in Conversational Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Large-Scale Domain Adaptation via Teacher-Student Learning.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A study on data augmentation of reverberant speech for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

May I take your order? A Neural Model for Extracting Structured Information from Conversations.

[BibT_eX]

[DOI]

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models.

[BibT_eX]

[DOI]

Tasha Nagamine

Nima Mesgarani

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Deep beamforming networks for multi-channel speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Linearly augmented deep neural network.

[BibT_eX]

[DOI]

Pegah Ghahremani

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2015

Exploring how deep neural networks form phonemic categories.

[BibT_eX]

[DOI]

Tasha Nagamine

Nima Mesgarani

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speech recognition with prediction-adaptation-correction recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

An introduction to computational networks and the computational network toolkit (invited talk).

[BibT_eX]

[DOI]

Christopher J. Rossbach

Jon Currey

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

The influence of pitch and noise on the discriminability of filterbank features.

[BibT_eX]

[DOI]

Malcolm Slaney

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Towards better performance with heterogeneous training data in acoustic modeling using deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Single-channel mixed speech recognition using deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Factored adaptation of speaker and environment using orthogonal subspace transforms.

[BibT_eX]

[DOI]

Hyunson Seo

Hong-Goo Kang

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Feature Learning in Deep Neural Networks - A Study on Speech Recognition Tasks

[BibT_eX]

[DOI]

Proceedings of the 1st International Conference on Learning Representations, 2013

Deep neural network features and semi-supervised training for low resource speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

An investigation of deep neural networks for noise robust speech recognition.

[BibT_eX]

[DOI]

Dong Yu

Yongqiang Wang

Proceedings of the IEEE International Conference on Acoustics, 2013

Multi-task learning in deep neural networks for improved phoneme recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Recent advances in deep learning for speech research at Microsoft.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Factored adaptation using a combination of feature-space and model-space transforms.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Efficient VTS Adaptation Using Jacobian Approximation.

[BibT_eX]

[DOI]

Jinyu Li

Yifan Gong

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Improvements to VTS feature enhancement.

[BibT_eX]

[DOI]

Jinyu Li

Yifan Gong

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Acoustic Model Training for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Techniques for Noise Robustness in Automatic Speech Recognition, 2012

2011

In-Car Media Search.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2011

Improved Bottleneck Features Using Pretrained Deep Neural Networks.

[BibT_eX]

[DOI]

Dong Yu

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Separating Speaker and Environmental Variability Using Factored Transforms.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

CROWDMOS: An approach for crowdsourcing mean opinion score studies.

[BibT_eX]

[DOI]

Flavio P. Ribeiro

Dinei A. F. Florêncio

Cha Zhang

Proceedings of the IEEE International Conference on Acoustics, 2011

Joint encoding of the waveform and speech recognition features using a transform codec.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Factored adaptation for separable compensation of speaker and environmental variability.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Noise Adaptive Training for Robust Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2010

HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Binary coding of speech spectrograms using a deep auto-encoder.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Acoustic model adaptation via Linear Spline Interpolation for robust speech recognition.

[BibT_eX]

[DOI]

Kaustubh Kalgaonkar

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Improving perceived accuracy for in-car media search.

[BibT_eX]

[DOI]

Yun-Cheng Ju

Ivan Tashev

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Voice search of structured media data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

The data deluge: Challenges and opportunities of unlimited data in statistical signal processing.

[BibT_eX]

[DOI]

Lei Zhang

Proceedings of the IEEE International Conference on Acoustics, 2009

Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition.

[BibT_eX]

[DOI]

Ozlem Kalinli

Proceedings of the IEEE International Conference on Acoustics, 2009

Noise robust model adaptation using linear spline interpolation.

[BibT_eX]

[DOI]

Kaustubh Kalgaonkar

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

Towards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Maximum a posteriori ICA: Applying prior knowledge to the separation of acoustic sources.

[BibT_eX]

[DOI]

Graham W. Taylor

Proceedings of the IEEE International Conference on Acoustics, 2008

Robust design of wideband loudspeaker arrays.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Training Wideband Acoustic Models Using Mixed-Bandwidth Training Data for Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2007

Commute UX: Telephone Dialog System for Location-based Services.

[BibT_eX]

[DOI]

Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, 2007

Robust location understanding in spoken dialog systems using intersections.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Microphone Array Post-Filter using Incremental Bayes Learning to Track the Spatial Distributions of Speech and Noise.

[BibT_eX]

[DOI]

Ivan Tashev

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2006

Automatic removal of typed keystrokes from speech signals.

[BibT_eX]

[DOI]

Amarnag Subramanya

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005

Robust bandwidth extension of noise-corrupted narrowband speech.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Training Wideband Acoustic Models using Mixed-Bandwidth Training Data via Feature Bandwidth Extension.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Speech Recognizer Based Maximum Likelihood Beamforming.

[BibT_eX]

[DOI]

Manuel Jesus Reyes-Gomez

Proceedings of the Speech Separation by Humans and Machines, 2005

2004

Likelihood-maximizing beamforming for robust hands-free speech recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2004

A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2004

Reconstruction of missing features for robust speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2004

Parameter sharing in subband likelihood-maximizing beamforming for speech recognition using microphone arrays.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Speech-recognizer-based filter optimization for microphone array processing.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2003

A harmonic-model-based front end for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Subband parameter optimization of microphone arrays for speech recognition in reverberant environments.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Speech recognizer-based microphone array processing for robust hands-free speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2002

2001

Calibration of microphone arrays for improved speech recognition.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

2000

Classifier-based mask estimation for missing feature methods of robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Reconstruction of damaged spectrographic features for robust speech recognition.

[BibT_eX]

[DOI]