Daniel Povey

Orcid: 0000-0002-0611-3634

Affiliations:
  • Xiaomi Inc., Beijing, China
  • Johns Hopkins University, USA (former)


According to our database1, Daniel Povey authored at least 177 papers between 1999 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
On Speaker Attribution with SURT.
CoRR, 2024

2023
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Zipformer: A faster and better encoder for automatic speech recognition.
CoRR, 2023

Libriheavy: a 50, 000 hours ASR corpus with punctuation casing and context.
CoRR, 2023

PromptASR for contextualized ASR with controllable style.
CoRR, 2023

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
CoRR, 2023

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts.
CoRR, 2023

Blank-regularized CTC for Frame Skipping in Neural Transducer.
CoRR, 2023

Delay-Penalized Transducer for Low-Latency Streaming ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Fast and Parallel Decoding for Transducer.
Proceedings of the IEEE International Conference on Acoustics, 2023

Building Keyword Search System from End-To-End Asr Systems.
Proceedings of the IEEE International Conference on Acoustics, 2023

Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Learning From Flawed Data: Weakly Supervised Automatic Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
GPU-accelerated Guided Source Separation for Meeting Transcription.
CoRR, 2022

Pruned RNN-T for fast, memory-efficient ASR training.
Proceedings of the Interspeech 2022, 2022

2021
LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation.
IEEE Signal Process. Lett., 2021

Lhotse: a speech data representation library for the modern deep learning ecosystem.
CoRR, 2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
CoRR, 2021

DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

speechocean762: An Open-Source Non-Native English Speech Corpus for Pronunciation Assessment.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Wake Word Detection with Streaming Transformers.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Parallelizable Lattice Rescoring Strategy with Neural Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2021

Multistream CNN for Robust Acoustic Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2021

An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Frustratingly Easy Noise-aware Training of Acoustic Models.
CoRR, 2020

Wake Word Detection with Alignment-Free Lattice-Free MMI.
Proceedings of the Interspeech 2020, 2020

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR.
Proceedings of the Interspeech 2020, 2020

Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.
Proceedings of the Interspeech 2020, 2020

Neural Language Modeling with Implicit Cache Pointers.
Proceedings of the Interspeech 2020, 2020

Efficient MDI Adaptation for n-Gram Language Models.
Proceedings of the Interspeech 2020, 2020

An Alternative to MFCCs for ASR.
Proceedings of the Interspeech 2020, 2020

OOV Recovery with Efficient 2nd Pass Decoding and Open-vocabulary Word-level RNNLM Rescoring for Hybrid ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Empirical Study of Transformer-Based Neural Language Model Adaptation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Diarization with Region Proposal Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Gpu-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings.
Proceedings of Machine Translation Summit XVII Volume 1: Research Track, 2019

Multi-PLDA Diarization on Children's Speech.
Proceedings of the Interspeech 2019, 2019

Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network.
Proceedings of the Interspeech 2019, 2019

The JHU ASR System for VOiCES from a Distance Challenge 2019.
Proceedings of the Interspeech 2019, 2019

State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18.
Proceedings of the Interspeech 2019, 2019

The JHU Speaker Recognition System for the VOiCES 2019 Challenge.
Proceedings of the Interspeech 2019, 2019

Improving Emotion Identification Using Phone Posteriors in Raw Speech Waveform Based DNN.
Proceedings of the Interspeech 2019, 2019

Speaker Recognition Benchmark Using the CHiME-5 Corpus.
Proceedings of the Interspeech 2019, 2019

x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition.
Proceedings of the Interspeech 2019, 2019

Optical Character Recognition with Chinese and Korean Character Decomposition.
Proceedings of the Second International Workshop on Machine Learning, 2019

Using ASR Methods for OCR.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Speaker Recognition for Multi-speaker Conversations Using X-vectors.
Proceedings of the IEEE International Conference on Acoustics, 2019

Probing the Information Encoded in X-Vectors.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Incremental Lattice Determinization for WFST Decoders.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs.
IEEE Signal Process. Lett., 2018

A Teacher-Student Learning Approach for Unsupervised Domain Adaptation of Sequence-Trained ASR Models.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Improving LF-MMI Using Unconstrained Supervisions for ASR.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Spoken Language Recognition using X-vectors.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification.
Proceedings of the Interspeech 2018, 2018

Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.
Proceedings of the Interspeech 2018, 2018

Emotion Identification from Raw Speech Signals Using DNNs.
Proceedings of the Interspeech 2018, 2018

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks.
Proceedings of the Interspeech 2018, 2018

Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition.
Proceedings of the Interspeech 2018, 2018

End-to-end Speech Recognition Using Lattice-free MMI.
Proceedings of the Interspeech 2018, 2018

End-to-end Deep Neural Network Age Estimation.
Proceedings of the Interspeech 2018, 2018

Acoustic Modeling from Frequency Domain Representations of Speech.
Proceedings of the Interspeech 2018, 2018

Output-Gate Projected Gated Recurrent Unit for Speech Recognition.
Proceedings of the Interspeech 2018, 2018

A GPU-based WFST Decoder with Exact Lattice Generation.
Proceedings of the Interspeech 2018, 2018

Neural Network Language Modeling with Letter-Based Features and Importance Sampling.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Pruned Rnnlm Lattice-Rescoring Algorithm for Automatic Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

X-Vectors: Robust DNN Embeddings for Speaker Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Time-Restricted Self-Attention Layer for ASR.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Semi-Supervised Training of Acoustic Models Using Lattice-Free MMI.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

JHU Diarization System Description.
Proceedings of the Fourth International Conference, 2018

2017
Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework.
Proceedings of the Interspeech 2017, 2017

Backstitch: Counteracting Finite-Sample Bias via Negative Steps.
Proceedings of the Interspeech 2017, 2017

The Kaldi OpenKWS System: Improving Low Resource Keyword Search.
Proceedings of the Interspeech 2017, 2017

Deep Neural Network Embeddings for Text-Independent Speaker Verification.
Proceedings of the Interspeech 2017, 2017

Phone Duration Modeling for LVCSR Using Neural Networks.
Proceedings of the Interspeech 2017, 2017

An Exploration of Dropout with LSTMs.
Proceedings of the Interspeech 2017, 2017

A study on data augmentation of reverberant speech for robust speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Speaker diarization using deep neural network embeddings.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Investigation of transfer learning for ASR using LF-MMI trained neural networks.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
Deep neural network-based speaker embeddings for end-to-end speaker verification.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI.
Proceedings of the Interspeech 2016, 2016

Far-Field ASR Without Parallel Data.
Proceedings of the Interspeech 2016, 2016

Acoustic Modelling from the Signal Domain Using CNNs.
Proceedings of the Interspeech 2016, 2016

Acoustic data-driven pronunciation lexicon generation for logographic languages.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
MUSAN: A Music, Speech, and Noise Corpus.
CoRR, 2015

Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging.
Proceedings of the 3rd International Conference on Learning Representations, 2015

A diversity-penalizing ensemble training method for deep learning.
Proceedings of the INTERSPEECH 2015, 2015

Modeling phonetic context with non-random forests for speech recognition.
Proceedings of the INTERSPEECH 2015, 2015

A time delay neural network architecture for efficient modeling of long temporal contexts.
Proceedings of the INTERSPEECH 2015, 2015

Reverberation robust acoustic modeling using i-vectors with time delay neural networks.
Proceedings of the INTERSPEECH 2015, 2015

Semi-supervised maximum mutual information training of deep neural network acoustic models.
Proceedings of the INTERSPEECH 2015, 2015

Audio augmentation for speech recognition.
Proceedings of the INTERSPEECH 2015, 2015

Pronunciation and silence probability modeling for ASR.
Proceedings of the INTERSPEECH 2015, 2015

Librispeech: An ASR corpus based on public domain audio books.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Time delay deep neural network-based universal background models for speaker recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
A keyword search system using open source software.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Improving speaker recognition performance in the domain adaptation challenge using deep neural networks.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Translations of the Callhome Egyptian Arabic corpus for conversational speech translation.
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers, 2014

Removing redundancy from lattices.
Proceedings of the INTERSPEECH 2014, 2014

Combination of FST and CN search in spoken term detection.
Proceedings of the INTERSPEECH 2014, 2014

Improving deep neural network acoustic models using generalized maxout networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

Multilingual deep neural network based acoustic modeling for rapid language adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2014

Some insights from translating conversational telephone speech.
Proceedings of the IEEE International Conference on Acoustics, 2014

A pitch extraction algorithm tuned for automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Sequence-discriminative training of deep neural networks.
Proceedings of the INTERSPEECH 2013, 2013

Improved feature processing for deep neural networks.
Proceedings of the INTERSPEECH 2013, 2013

Feature and score level combination of subspace Gaussinas in LVCSR task.
Proceedings of the IEEE International Conference on Acoustics, 2013

Combining forward and backward search in decoding.
Proceedings of the IEEE International Conference on Acoustics, 2013

Quantifying the value of pronunciation lexicons for keyword search in lowresource languages.
Proceedings of the IEEE International Conference on Acoustics, 2013

Using proxies for OOV keywords in the keyword search task.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
Krylov Subspace Descent for Deep Learning.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

A basis representation of constrained MLLR transforms for robust adaptation.
Comput. Speech Lang., 2012

Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech.
Proceedings of the INTERSPEECH 2012, 2012

Modeling gender dependency in the Subspace GMM framework.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Revisiting Recurrent Neural Networks for robust ASR.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Revisiting semi-continuous hidden Markov models.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Generating exact lattices in the WFST framework.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Minimum Bayes Risk decoding and system combination based on a recursion for edit distance.
Comput. Speech Lang., 2011

The subspace Gaussian mixture model - A structured model for speech recognition.
Comput. Speech Lang., 2011

State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs.
Proceedings of the INTERSPEECH 2011, 2011

A basis method for robust estimation of constrained MLLR.
Proceedings of the IEEE International Conference on Acoustics, 2011

A symmetrization of the Subspace Gaussian Mixture Model.
Proceedings of the IEEE International Conference on Acoustics, 2011

Strategies for using MLP based features with limited target-language training data.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Speaker adaptation with an Exponential Transform.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Strategies for training large scale neural network language models.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
An improved consensus-like method for Minimum Bayes Risk decoding and lattice combination.
Proceedings of the IEEE International Conference on Acoustics, 2010

The IBM 2008 GALE Arabic speech transcription system.
Proceedings of the IEEE International Conference on Acoustics, 2010

Subspace Gaussian Mixture Models for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Approaches to automatic lexicon learning with limited training examples.
Proceedings of the IEEE International Conference on Acoustics, 2010

A novel estimation of feature-space MLLR for full-covariance models.
Proceedings of the IEEE International Conference on Acoustics, 2010

The 2009 IBM GALE Mandarin broadcast transcription system.
Proceedings of the IEEE International Conference on Acoustics, 2010

Speaking rate adaptation using continuous frame rate normalization.
Proceedings of the IEEE International Conference on Acoustics, 2010

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program.
IEEE Trans. Speech Audio Process., 2009

Minimum hypothesis phone error as a decoding method for speech recognition.
Proceedings of the INTERSPEECH 2009, 2009

Large margin semi-tied covariance transforms for discriminative training.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Penalty function maximization for large margin HMM training.
Proceedings of the INTERSPEECH 2008, 2008

Fast speaker adaptive training for speech recognition.
Proceedings of the INTERSPEECH 2008, 2008

XMLLR for improved speaker adaptation in speech recognition.
Proceedings of the INTERSPEECH 2008, 2008

Monte Carlo model-space noise adaptation for speech recognition.
Proceedings of the INTERSPEECH 2008, 2008

Quick fmllr for speaker adaptation in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Boosted MMI for model and feature-space discriminative training.
Proceedings of the IEEE International Conference on Acoustics, 2008

Universal background model based speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
The IBM 2006 Gale Arabic ASR System.
Proceedings of the IEEE International Conference on Acoustics, 2007

The Impact of ASR on Speech-to-Speech Translation Performance.
Proceedings of the IEEE International Conference on Acoustics, 2007

Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Corrections to "Automatic Transcription of Conversational Telephone Speech".
IEEE Trans. Speech Audio Process., 2006

Advances in speech transcription at IBM under the DARPA EARS program.
IEEE Trans. Speech Audio Process., 2006

Automated Quality Monitoring for Call Centers using Speech and NLP Technologies.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings.
Proceedings of the Machine Learning for Multimodal Interaction, 2006

Feature and model space speaker adaptation with full covariance Gaussians.
Proceedings of the INTERSPEECH 2006, 2006

SPAM and full covariance for speech recognition.
Proceedings of the INTERSPEECH 2006, 2006

Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Secondary Classification for GMM Based Speaker Recognition.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Morpheme-Based Language Modeling for Arabic Lvcsr.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Automatic transcription of conversational telephone speech.
IEEE Trans. Speech Audio Process., 2005

Anatomy of an extremely fast LVCSR decoder.
Proceedings of the INTERSPEECH 2005, 2005

Improvements to fMPE for discriminative training of features.
Proceedings of the INTERSPEECH 2005, 2005

Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition.
Proceedings of the INTERSPEECH 2005, 2005

The IBM 2004 Conversational Telephony System for Rich Transcription.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

fMPE: Discriminatively Trained Features for Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Feature space Gaussianization.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Phone duration modeling for LVCSR.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
MMI-MAP and MPE-MAP for acoustic model adaptation.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Discriminative Training for HMM-Based Offline Handwritten Character Recognition.
Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), 2003

Discriminative map for acoustic model adaptation.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Porting: SwitchBoard to the VoiceMail task.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Large scale discriminative training of hidden Markov models for speech recognition.
Comput. Speech Lang., 2002

Minimum Phone Error and I-smoothing for improved discriminative training.
Proceedings of the IEEE International Conference on Acoustics, 2002

2001
Improved discriminative training techniques for large vocabulary continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2001

New features in the CU-HTK system for transcription of conversational telephone speech.
Proceedings of the IEEE International Conference on Acoustics, 2001

1999
Frame discrimination training for HMMs for large vocabulary speech recognition.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999


  Loading...