Daniel Povey

According to our database1, Daniel Povey authored at least 123 papers between 1999 and 2018.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2018
Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR.
IEEE/ACM Trans. Audio, Speech & Language Processing, 2018

Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs.
IEEE Signal Process. Lett., 2018

A GPU-based WFST Decoder with Exact Lattice Generation.
CoRR, 2018

Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification.
Proceedings of the Interspeech 2018, 2018

Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.
Proceedings of the Interspeech 2018, 2018

Emotion Identification from Raw Speech Signals Using DNNs.
Proceedings of the Interspeech 2018, 2018

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks.
Proceedings of the Interspeech 2018, 2018

Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition.
Proceedings of the Interspeech 2018, 2018

End-to-end Speech Recognition Using Lattice-free MMI.
Proceedings of the Interspeech 2018, 2018

End-to-end Deep Neural Network Age Estimation.
Proceedings of the Interspeech 2018, 2018

Acoustic Modeling from Frequency Domain Representations of Speech.
Proceedings of the Interspeech 2018, 2018

Output-Gate Projected Gated Recurrent Unit for Speech Recognition.
Proceedings of the Interspeech 2018, 2018

A GPU-based WFST Decoder with Exact Lattice Generation.
Proceedings of the Interspeech 2018, 2018

Neural Network Language Modeling with Letter-Based Features and Importance Sampling.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Pruned Rnnlm Lattice-Rescoring Algorithm for Automatic Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

X-Vectors: Robust DNN Embeddings for Speaker Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Time-Restricted Self-Attention Layer for ASR.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Semi-Supervised Training of Acoustic Models Using Lattice-Free MMI.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework.
CoRR, 2017

Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework.
Proceedings of the Interspeech 2017, 2017

Backstitch: Counteracting Finite-Sample Bias via Negative Steps.
Proceedings of the Interspeech 2017, 2017

The Kaldi OpenKWS System: Improving Low Resource Keyword Search.
Proceedings of the Interspeech 2017, 2017

Deep Neural Network Embeddings for Text-Independent Speaker Verification.
Proceedings of the Interspeech 2017, 2017

Phone Duration Modeling for LVCSR Using Neural Networks.
Proceedings of the Interspeech 2017, 2017

An Exploration of Dropout with LSTMs.
Proceedings of the Interspeech 2017, 2017

A study on data augmentation of reverberant speech for robust speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Speaker diarization using deep neural network embeddings.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Investigation of transfer learning for ASR using LF-MMI trained neural networks.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
Deep neural network-based speaker embeddings for end-to-end speaker verification.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI.
Proceedings of the Interspeech 2016, 2016

Far-Field ASR Without Parallel Data.
Proceedings of the Interspeech 2016, 2016

Acoustic Modelling from the Signal Domain Using CNNs.
Proceedings of the Interspeech 2016, 2016

Acoustic data-driven pronunciation lexicon generation for logographic languages.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
MUSAN: A Music, Speech, and Noise Corpus.
CoRR, 2015

A diversity-penalizing ensemble training method for deep learning.
Proceedings of the INTERSPEECH 2015, 2015

Modeling phonetic context with non-random forests for speech recognition.
Proceedings of the INTERSPEECH 2015, 2015

A time delay neural network architecture for efficient modeling of long temporal contexts.
Proceedings of the INTERSPEECH 2015, 2015

Reverberation robust acoustic modeling using i-vectors with time delay neural networks.
Proceedings of the INTERSPEECH 2015, 2015

Semi-supervised maximum mutual information training of deep neural network acoustic models.
Proceedings of the INTERSPEECH 2015, 2015

Audio augmentation for speech recognition.
Proceedings of the INTERSPEECH 2015, 2015

Pronunciation and silence probability modeling for ASR.
Proceedings of the INTERSPEECH 2015, 2015

Librispeech: An ASR corpus based on public domain audio books.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Time delay deep neural network-based universal background models for speaker recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging.
CoRR, 2014

A keyword search system using open source software.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Improving speaker recognition performance in the domain adaptation challenge using deep neural networks.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Removing redundancy from lattices.
Proceedings of the INTERSPEECH 2014, 2014

Combination of FST and CN search in spoken term detection.
Proceedings of the INTERSPEECH 2014, 2014

Improving deep neural network acoustic models using generalized maxout networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

Multilingual deep neural network based acoustic modeling for rapid language adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2014

Some insights from translating conversational telephone speech.
Proceedings of the IEEE International Conference on Acoustics, 2014

A pitch extraction algorithm tuned for automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Sequence-discriminative training of deep neural networks.
Proceedings of the INTERSPEECH 2013, 2013

Improved feature processing for deep neural networks.
Proceedings of the INTERSPEECH 2013, 2013

Feature and score level combination of subspace Gaussinas in LVCSR task.
Proceedings of the IEEE International Conference on Acoustics, 2013

Combining forward and backward search in decoding.
Proceedings of the IEEE International Conference on Acoustics, 2013

Quantifying the value of pronunciation lexicons for keyword search in lowresource languages.
Proceedings of the IEEE International Conference on Acoustics, 2013

Using proxies for OOV keywords in the keyword search task.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
Krylov Subspace Descent for Deep Learning.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

A basis representation of constrained MLLR transforms for robust adaptation.
Computer Speech & Language, 2012

Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech.
Proceedings of the INTERSPEECH 2012, 2012

Modeling gender dependency in the Subspace GMM framework.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Revisiting Recurrent Neural Networks for robust ASR.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Revisiting semi-continuous hidden Markov models.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Generating exact lattices in the WFST framework.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Minimum Bayes Risk decoding and system combination based on a recursion for edit distance.
Computer Speech & Language, 2011

The subspace Gaussian mixture model - A structured model for speech recognition.
Computer Speech & Language, 2011

State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs.
Proceedings of the INTERSPEECH 2011, 2011

A basis method for robust estimation of constrained MLLR.
Proceedings of the IEEE International Conference on Acoustics, 2011

A symmetrization of the Subspace Gaussian Mixture Model.
Proceedings of the IEEE International Conference on Acoustics, 2011

Strategies for using MLP based features with limited target-language training data.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Speaker adaptation with an Exponential Transform.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Strategies for training large scale neural network language models.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
An improved consensus-like method for Minimum Bayes Risk decoding and lattice combination.
Proceedings of the IEEE International Conference on Acoustics, 2010

The IBM 2008 GALE Arabic speech transcription system.
Proceedings of the IEEE International Conference on Acoustics, 2010

Subspace Gaussian Mixture Models for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Approaches to automatic lexicon learning with limited training examples.
Proceedings of the IEEE International Conference on Acoustics, 2010

A novel estimation of feature-space MLLR for full-covariance models.
Proceedings of the IEEE International Conference on Acoustics, 2010

The 2009 IBM GALE Mandarin broadcast transcription system.
Proceedings of the IEEE International Conference on Acoustics, 2010

Speaking rate adaptation using continuous frame rate normalization.
Proceedings of the IEEE International Conference on Acoustics, 2010

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program.
IEEE Trans. Audio, Speech & Language Processing, 2009

Minimum hypothesis phone error as a decoding method for speech recognition.
Proceedings of the INTERSPEECH 2009, 2009

Large margin semi-tied covariance transforms for discriminative training.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Penalty function maximization for large margin HMM training.
Proceedings of the INTERSPEECH 2008, 2008

Fast speaker adaptive training for speech recognition.
Proceedings of the INTERSPEECH 2008, 2008

XMLLR for improved speaker adaptation in speech recognition.
Proceedings of the INTERSPEECH 2008, 2008

Monte Carlo model-space noise adaptation for speech recognition.
Proceedings of the INTERSPEECH 2008, 2008

Quick fmllr for speaker adaptation in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Boosted MMI for model and feature-space discriminative training.
Proceedings of the IEEE International Conference on Acoustics, 2008

Universal background model based speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
The IBM 2006 Gale Arabic ASR System.
Proceedings of the IEEE International Conference on Acoustics, 2007

The Impact of ASR on Speech-to-Speech Translation Performance.
Proceedings of the IEEE International Conference on Acoustics, 2007

Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Corrections to "Automatic Transcription of Conversational Telephone Speech".
IEEE Trans. Audio, Speech & Language Processing, 2006

Advances in speech transcription at IBM under the DARPA EARS program.
IEEE Trans. Audio, Speech & Language Processing, 2006

Automated Quality Monitoring for Call Centers using Speech and NLP Technologies.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings.
Proceedings of the Machine Learning for Multimodal Interaction, 2006

Feature and model space speaker adaptation with full covariance Gaussians.
Proceedings of the INTERSPEECH 2006, 2006

SPAM and full covariance for speech recognition.
Proceedings of the INTERSPEECH 2006, 2006

Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Secondary Classification for GMM Based Speaker Recognition.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Morpheme-Based Language Modeling for Arabic Lvcsr.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Automatic transcription of conversational telephone speech.
IEEE Trans. Speech and Audio Processing, 2005

Anatomy of an extremely fast LVCSR decoder.
Proceedings of the INTERSPEECH 2005, 2005

Improvements to fMPE for discriminative training of features.
Proceedings of the INTERSPEECH 2005, 2005

Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition.
Proceedings of the INTERSPEECH 2005, 2005

The IBM 2004 Conversational Telephony System for Rich Transcription.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

fMPE: Discriminatively Trained Features for Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Feature space Gaussianization.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Phone duration modeling for LVCSR.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
MMI-MAP and MPE-MAP for acoustic model adaptation.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Discriminative Training for HMM-Based Offline Handwritten Character Recognition.
Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), 2003

Discriminative map for acoustic model adaptation.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Porting: SwitchBoard to the VoiceMail task.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Large scale discriminative training of hidden Markov models for speech recognition.
Computer Speech & Language, 2002

Minimum Phone Error and I-smoothing for improved discriminative training.
Proceedings of the IEEE International Conference on Acoustics, 2002

2001
Improved discriminative training techniques for large vocabulary continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2001

New features in the CU-HTK system for transcription of conversational telephone speech.
Proceedings of the IEEE International Conference on Acoustics, 2001

1999
Frame discrimination training for HMMs for large vocabulary speech recognition.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999


  Loading...