Daniel Povey

Matthew Wiesner

Nicholas Andrews

CoRR, March, 2026

2025

ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching.

[BibT_eX]

[DOI]

CoRR, July, 2025

ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching.

[BibT_eX]

[DOI]

CoRR, June, 2025

HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Spoken Language Translation, 2025

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

CR-CTC: Consistency regularization on CTC for improved speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

WST: Weakly Supervised Transducer for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Jian Wu

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization.

[BibT_eX]

[DOI]

CoRR, 2024

On Speaker Attribution with SURT.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

LibriheavyMix: A 20, 000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Enhancing Neural Transducer for Multilingual ASR with Synchronized Language Diarization.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Zipformer: A faster and better encoder for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

PromptASR for Contextualized ASR with Controllable Style.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Libriheavy: A 50, 000 Hours ASR Corpus with Punctuation Casing and Context.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Less Peaky and More Accurate CTC Forced Alignment by Label Priors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM.

[BibT_eX]

[DOI]

Proceedings of the ECAI 2024 - 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain, 2024

ConEC: Earnings Call Dataset with Real-world Contexts for Benchmarking Contextual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Delay-penalized CTC Implemented Based on Finite State Transducer.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Blank-regularized CTC for Frame Skipping in Neural Transducer.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

GPU-accelerated Guided Source Separation for Meeting Transcription.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Delay-Penalized Transducer for Low-Latency Streaming ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Fast and Parallel Decoding for Transducer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Building Keyword Search System from End-To-End Asr Systems.

[BibT_eX]

[DOI]

Ruizhe Huang

Matthew Wiesner

Jan Trmal

Proceedings of the IEEE International Conference on Acoustics, 2023

Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Learning From Flawed Data: Weakly Supervised Automatic Speech Recognition.

[BibT_eX]

[DOI]

Dongji Gao

Hainan Xu

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Pruned RNN-T for fast, memory-efficient ASR training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

Lhotse: a speech data representation library for the modern deep learning ecosystem.

[BibT_eX]

[DOI]

CoRR, 2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio.

[BibT_eX]

[DOI]

CoRR, 2021

DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

speechocean762: An Open-Source Non-Native English Speech Corpus for Pronunciation Assessment.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Wake Word Detection with Streaming Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

A Parallelizable Lattice Rescoring Strategy with Neural Language Models.

[BibT_eX]

[DOI]

Ke Li

Venkata Krishna Naveen Tadala

Proceedings of the IEEE International Conference on Acoustics, 2021

Multistream CNN for Robust Acoustic Modeling.

[BibT_eX]

[DOI]

Kyu Jeong Han

Jing Pan

Tao Ma

Dan Povey

Proceedings of the IEEE International Conference on Acoustics, 2021

An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Frustratingly Easy Noise-aware Training of Acoustic Models.

[BibT_eX]

[DOI]

CoRR, 2020

Wake Word Detection with Alignment-Free Lattice-Free MMI.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.

[BibT_eX]

[DOI]

Srikanth R. Madikeri

Banriskhem K. Khonglah

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Language Modeling with Implicit Cache Pointers.

[BibT_eX]

[DOI]

Ke Li

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Efficient MDI Adaptation for n-Gram Language Models.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Alternative to MFCCs for ASR.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

OOV Recovery with Efficient 2nd Pass Decoding and Open-vocabulary Word-level RNNLM Rescoring for Hybrid ASR.

[BibT_eX]

[DOI]

Xiaohui Zhang

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Empirical Study of Transformer-Based Neural Language Model Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Diarization with Region Proposal Network.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Gpu-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings.

[BibT_eX]

[DOI]

Proceedings of Machine Translation Summit XVII Volume 1: Research Track, 2019

Multi-PLDA Diarization on Children's Speech.

[BibT_eX]

[DOI]

Jiamin Xie

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network.

[BibT_eX]

[DOI]

Fei Wu

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The JHU ASR System for VOiCES from a Distance Challenge 2019.

[BibT_eX]

[DOI]

Phani Sankar Nidadavolu

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18.

[BibT_eX]

[DOI]

Pedro A. Torres-Carrasquillo

Najim Dehak

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The JHU Speaker Recognition System for the VOiCES 2019 Challenge.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Emotion Identification Using Phone Posteriors in Raw Speech Waveform Based DNN.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker Recognition Benchmark Using the CHiME-5 Corpus.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Optical Character Recognition with Chinese and Korean Character Decomposition.

[BibT_eX]

[DOI]

Chun-Chieh Chang

Ashish Arora

David Etter

Proceedings of the Second International Workshop on Machine Learning, 2019

Using ASR Methods for OCR.

[BibT_eX]

[DOI]

Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Speaker Recognition for Multi-speaker Conversations Using X-vectors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Probing the Information Encoded in X-Vectors.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Incremental Lattice Determinization for WFST Decoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2018

A Teacher-Student Learning Approach for Unsupervised Domain Adaptation of Sequence-Trained ASR Models.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Improving LF-MMI Using Unconstrained Supervisions for ASR.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Spoken Language Recognition using X-vectors.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Emotion Identification from Raw Speech Signals Using DNNs.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

End-to-end Speech Recognition Using Lattice-free MMI.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

End-to-end Deep Neural Network Age Estimation.

[BibT_eX]

[DOI]

Pegah Ghahremani

Phani Sankar Nidadavolu

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Acoustic Modeling from Frequency Domain Representations of Speech.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Output-Gate Projected Gated Recurrent Unit for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A GPU-based WFST Decoder with Exact Lattice Generation.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Neural Network Language Modeling with Letter-Based Features and Importance Sampling.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Pruned Rnnlm Lattice-Rescoring Algorithm for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

X-Vectors: Robust DNN Embeddings for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Time-Restricted Self-Attention Layer for ASR.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Semi-Supervised Training of Acoustic Models Using Lattice-Free MMI.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

JHU Diarization System Description.

[BibT_eX]

[DOI]

Zili Huang

L. Paola García-Perera

Jesús Villalba

Najim Dehak

Proceedings of the Fourth International Conference, 2018

2017

Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Backstitch: Counteracting Finite-Sample Bias via Negative Steps.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

The Kaldi OpenKWS System: Improving Low Resource Keyword Search.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Deep Neural Network Embeddings for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Phone Duration Modeling for LVCSR Using Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An Exploration of Dropout with LSTMs.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A study on data augmentation of reverberant speech for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Speaker diarization using deep neural network embeddings.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning.

[BibT_eX]

[DOI]

Vimal Manohar

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Investigation of transfer learning for ASR using LF-MMI trained neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

Deep neural network-based speaker embeddings for end-to-end speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Far-Field ASR Without Parallel Data.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Acoustic Modelling from the Signal Domain Using CNNs.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Acoustic data-driven pronunciation lexicon generation for logographic languages.

[BibT_eX]

[DOI]

Guoguo Chen

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

MUSAN: A Music, Speech, and Noise Corpus.

[BibT_eX]

[DOI]

David Snyder

Guoguo Chen

CoRR, 2015

Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging.

[BibT_eX]

[DOI]

Xiaohui Zhang

Proceedings of the 3rd International Conference on Learning Representations, 2015

A diversity-penalizing ensemble training method for deep learning.

[BibT_eX]

[DOI]

Xiaohui Zhang

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Modeling phonetic context with non-random forests for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A time delay neural network architecture for efficient modeling of long temporal contexts.

[BibT_eX]

[DOI]

Vijayaditya Peddinti

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Reverberation robust acoustic modeling using i-vectors with time delay neural networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Semi-supervised maximum mutual information training of deep neural network acoustic models.

[BibT_eX]

[DOI]

Vimal Manohar

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Audio augmentation for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Pronunciation and silence probability modeling for ASR.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Librispeech: An ASR corpus based on public domain audio books.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Time delay deep neural network-based universal background models for speaker recognition.

[BibT_eX]

[DOI]

David Snyder

Daniel Garcia-Romero

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

A keyword search system using open source software.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Improving speaker recognition performance in the domain adaptation challenge using deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Translations of the Callhome Egyptian Arabic corpus for conversational speech translation.

[BibT_eX]

[DOI]

Proceedings of the 11th International Workshop on Spoken Language Translation: Papers, 2014

Removing redundancy from lattices.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Combination of FST and CN search in spoken term detection.

[BibT_eX]

[DOI]

Alexander I. Rudnicky

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Improving deep neural network acoustic models using generalized maxout networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Multilingual deep neural network based acoustic modeling for rapid language adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Some insights from translating conversational telephone speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

A pitch extraction algorithm tuned for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Sequence-discriminative training of deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improved feature processing for deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Feature and score level combination of subspace Gaussinas in LVCSR task.

[BibT_eX]

[DOI]

Petr Motlícek

Martin Karafiát

Proceedings of the IEEE International Conference on Acoustics, 2013

Combining forward and backward search in decoding.

[BibT_eX]

[DOI]

Mirko Hannemann

Geoffrey Zweig

Proceedings of the IEEE International Conference on Acoustics, 2013

Quantifying the value of pronunciation lexicons for keyword search in lowresource languages.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Using proxies for OOV keywords in the keyword search task.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Krylov Subspace Descent for Deep Learning.

[BibT_eX]

[DOI]

Oriol Vinyals

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

A basis representation of constrained MLLR transforms for robust adaptation.

[BibT_eX]

[DOI]

Kaisheng Yao

Comput. Speech Lang., 2012

Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech.

[BibT_eX]

[DOI]

Chao Weng

Biing-Hwang Juang

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Modeling gender dependency in the Subspace GMM framework.

[BibT_eX]

[DOI]

Ngoc Thang Vu

Tanja Schultz

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Revisiting Recurrent Neural Networks for robust ASR.

[BibT_eX]

[DOI]

Oriol Vinyals

Suman V. Ravuri

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Revisiting semi-continuous hidden Markov models.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Generating exact lattices in the WFST framework.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Minimum Bayes Risk decoding and system combination based on a recursion for edit distance.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2011

The subspace Gaussian mixture model - A structured model for speech recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2011

State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs.

[BibT_eX]

[DOI]

Yanmin Qian

Jia Liu

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A basis method for robust estimation of constrained MLLR.

[BibT_eX]

[DOI]

Kaisheng Yao

Proceedings of the IEEE International Conference on Acoustics, 2011

A symmetrization of the Subspace Gaussian Mixture Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Strategies for using MLP based features with limited target-language training data.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Speaker adaptation with an Exponential Transform.

[BibT_eX]

[DOI]

Geoffrey Zweig

Alex Acero

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Strategies for training large scale neural network language models.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

An improved consensus-like method for Minimum Bayes Risk decoding and lattice combination.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

The IBM 2008 GALE Arabic speech transcription system.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Subspace Gaussian Mixture Models for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Approaches to automatic lexicon learning with limited training examples.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

A novel estimation of feature-space MLLR for full-covariance models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

The 2009 IBM GALE Mandarin broadcast transcription system.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Speaking rate adaptation using continuous frame rate normalization.

[BibT_eX]

[DOI]

Stephen M. Chu

Proceedings of the IEEE International Conference on Acoustics, 2010

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2009

Minimum hypothesis phone error as a decoding method for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Large margin semi-tied covariance transforms for discriminative training.

[BibT_eX]

[DOI]

Hagen Soltau

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

Penalty function maximization for large margin HMM training.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Fast speaker adaptive training for speech recognition.

[BibT_eX]

[DOI]

Hong-Kwang Jeff Kuo

Hagen Soltau

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

XMLLR for improved speaker adaptation in speech recognition.

[BibT_eX]

[DOI]

Hong-Kwang Jeff Kuo

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Monte Carlo model-space noise adaptation for speech recognition.

[BibT_eX]

[DOI]

Brian Kingsbury

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Quick fmllr for speaker adaptation in speech recognition.

[BibT_eX]

[DOI]

Balakrishnan Varadarajan

Selina M. Chu

Proceedings of the IEEE International Conference on Acoustics, 2008

Boosted MMI for model and feature-space discriminative training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Universal background model based speech recognition.

[BibT_eX]

[DOI]

Selina M. Chu

Balakrishnan Varadarajan

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

The IBM 2006 Gale Arabic ASR System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

The Impact of ASR on Speech-to-Speech Translation Performance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training.

[BibT_eX]

[DOI]

Brian Kingsbury

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Corrections to "Automatic Transcription of Conversational Telephone Speech".

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2006

Advances in speech transcription at IBM under the DARPA EARS program.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2006

Automated Quality Monitoring for Call Centers using Speech and NLP Technologies.

[BibT_eX]

[DOI]

Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2006

Feature and model space speaker adaptation with full covariance Gaussians.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

SPAM and full covariance for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Secondary Classification for GMM Based Speaker Recognition.

[BibT_eX]

[DOI]

Jason W. Pelecanos

Ganesh N. Ramaswamy

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Morpheme-Based Language Modeling for Arabic Lvcsr.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Automatic transcription of conversational telephone speech.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2005

Anatomy of an extremely fast LVCSR decoder.

[BibT_eX]

[DOI]

Geoffrey Zweig

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Improvements to fMPE for discriminative training of features.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition.

[BibT_eX]

[DOI]

Jing Huang

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

The IBM 2004 Conversational Telephony System for Rich Transcription.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

fMPE: Discriminatively Trained Features for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Feature space Gaussianization.

[BibT_eX]

[DOI]

Satya Dharanipragada

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Phone duration modeling for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

MMI-MAP and MPE-MAP for acoustic model adaptation.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Discriminative Training for HMM-Based Offline Handwritten Character Recognition.

[BibT_eX]

[DOI]

Roongroj Nopsuwanchai

Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), 2003

Discriminative map for acoustic model adaptation.

[BibT_eX]

[DOI]

Mark J. F. Gales

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Porting: SwitchBoard to the VoiceMail task.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Large scale discriminative training of hidden Markov models for speech recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2002

Minimum Phone Error and I-smoothing for improved discriminative training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2002

2001

Improved discriminative training techniques for large vocabulary continuous speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

New features in the CU-HTK system for transcription of conversational telephone speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

1999

Frame discrimination training for HMMs for large vocabulary speech recognition.

[BibT_eX]

[DOI]