We stand with Ukraine

We stand with Ukraine

George Saon

Orcid: 0009-0004-6837-5009

According to our database¹, George Saon authored at least 141 papers between 1994 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions.

[DOI]

,

,

,

Mark Hasegawa-Johnson

,

Brian Kingsbury

,

CoRR, April, 2026

2025

Exploring the Limits of Conformer CTC-Encoder for Speech Emotion Recognition using Large Language Models.

[DOI]

Edmilson da Silva Morais

,

Hagai Aronowitz

,

,

,

,

Brian Kingsbury

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

A Non-autoregressive Model for Joint STT and TTS.

[DOI]

,

Brian Kingsbury

,

,

,

Slava Shechtman

,

Hagai Aronowitz

,

Eric Fosler-Lussier

,

Luis A. Lastras

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

LLM based Text Generation for Improved Low-resource Speech Recognition Models.

[DOI]

,

,

,

Hong-Kwang Jeff Kuo

,

Daniel Bolaños

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Knowledge Distillation Based Training of Unified Conformer CTC Models for Multi-form ASR.

[DOI]

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities.

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Bilevel Joint Unsupervised and Supervised Training for Automatic Speech Recognition.

[DOI]

,

,

,

,

,

Brian Kingsbury

,

CoRR, 2024

Exploring the limits of decoder-only models trained on public speech recognition corpora.

[DOI]

,

,

Brian Kingsbury

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems.

[DOI]

,

Masayuki Suzuki

,

,

Masayasu Muraoka

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Semi-Autoregressive Streaming ASR with Label Context.

[DOI]

,

,

Shinji Watanabe

,

Brian Kingsbury

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Soft Random Sampling: A Theoretical and Empirical Analysis.

[DOI]

,

Ashish R. Mittal

,

,

,

,

Brian Kingsbury

CoRR, 2023

Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition.

[DOI]

,

,

Brian Kingsbury

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition.

[DOI]

,

Hong-Kwang Jeff Kuo

,

,

Brian Kingsbury

Proceedings of the IEEE International Conference on Acoustics, 2023

Diagonal State Space Augmented Transformers for Speech Recognition.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries.

[DOI]

Ashish R. Mittal

,

Sunita Sarawagi

,

,

,

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022

Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems.

[DOI]

,

Masayuki Suzuki

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.

[DOI]

,

,

,

Shinji Watanabe

,

Brian Kingsbury

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Extending RNN-T-based speech recognition systems with emotion and language classification.

[DOI]

,

Hagai Aronowitz

,

Edmilson da Silva Morais

,

Matheus Damasceno

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Global RNN Transducer Models For Multi-dialect Speech Recognition.

[DOI]

,

,

Masayuki Suzuki

,

,

,

Brian Kingsbury

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.

[DOI]

,

,

Mauricio J. Serrano

,

Swagath Venkataramani

,

,

,

Brian Kingsbury

,

Kailash Gopalakrishnan

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.

[DOI]

,

,

,

Masayuki Suzuki

,

,

Brian Kingsbury

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.

[DOI]

,

Brian Kingsbury

,

,

Hong-Kwang Jeff Kuo

Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.

[DOI]

,

Hong-Kwang Jeff Kuo

,

Brian Kingsbury

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving End-to-end Models for Set Prediction in Spoken Language Understanding.

[DOI]

Hong-Kwang Jeff Kuo

,

,

,

Brian Kingsbury

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Speech Recognition Using Biologically-Inspired Neural Networks.

[DOI]

Thomas Bohnstingl

,

,

Stanislaw Wozniak

,

,

Evangelos Eleftheriou

,

Angeliki Pantazi

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Asynchronous Decentralized Distributed Training of Acoustic Models.

[DOI]

,

,

,

,

,

Brian Kingsbury

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Towards efficient end-to-end speech recognition with biologically-inspired neural networks.

[DOI]

Thomas Bohnstingl

,

,

Stanislaw Wozniak

,

,

Evangelos Eleftheriou

,

Angeliki Pantazi

CoRR, 2021

On the Limit of English Conversational Speech Recognition.

[DOI]

,

,

Brian Kingsbury

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio.

[DOI]

,

,

Brian Kingsbury

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Integrating Dialog History into End-to-End Spoken Language Understanding Systems.

[DOI]

,

,

Hong-Kwang Jeff Kuo

,

Sachindra Joshi

,

,

,

Brian Kingsbury

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

4-Bit Quantization of LSTM-Based Speech Recognition Models.

[DOI]

,

,

Mauricio J. Serrano

,

,

,

Swagath Venkataramani

,

,

,

Brian Kingsbury

,

,

,

Kailash Gopalakrishnan

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Reducing Exposure Bias in Training Recurrent Neural Network Transducers.

[DOI]

,

Brian Kingsbury

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Advancing RNN Transducer Technology for Speech Recognition.

[DOI]

,

,

Daniel Bolaños

,

Brian Kingsbury

Proceedings of the IEEE International Conference on Acoustics, 2021

RNN Transducer Models for Spoken Language Understanding.

[DOI]

,

Hong-Kwang Jeff Kuo

,

,

,

Brian Kingsbury

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies.

[DOI]

,

,

,

,

Michael Picheny

,

IEEE Signal Process. Mag., 2020

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition.

[DOI]

,

,

,

,

Michael Picheny

,

CoRR, 2020

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300.

[DOI]

,

,

Kartik Audhkhasi

,

Brian Kingsbury

CoRR, 2020

Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard.

[DOI]

,

,

Kartik Audhkhasi

,

Brian Kingsbury

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition.

[DOI]

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Efficiency in Large-Scale Decentralized Distributed Training.

[DOI]

,

,

,

,

,

Brian Kingsbury

,

,

,

Alper Buyuktosunoglu

,

,

,

Michael Picheny

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Alignment-Length Synchronous Decoding for RNN Transducer.

[DOI]

,

,

Kartik Audhkhasi

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

A Highly Efficient Distributed Deep Learning System for Automatic Speech Recognition.

[DOI]

,

,

,

,

,

Alper Buyuktosunoglu

,

Brian Kingsbury

,

,

Michael Picheny

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Advancing Sequence-to-Sequence Based Speech Recognition.

[DOI]

,

Kartik Audhkhasi

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Challenging the Boundaries of Speech Recognition: The MALACH Corpus.

[DOI]

Michael Picheny

,

,

Brian Kingsbury

,

Kartik Audhkhasi

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.

[DOI]

Kartik Audhkhasi

,

,

,

Brian Kingsbury

,

Michael Picheny

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Distributed Deep Learning Strategies for Automatic Speech Recognition.

[DOI]

,

,

,

Brian Kingsbury

,

,

,

Michael Picheny

Proceedings of the IEEE International Conference on Acoustics, 2019

English Broadcast News Speech Recognition by Humans and Machines.

[DOI]

,

Masayuki Suzuki

,

,

,

,

,

Brian Kingsbury

,

Michael Picheny

,

,

Alice Kaiser-Schatzlein

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Sequence Noise Injected Training for End-to-end Speech Recognition.

[DOI]

,

,

Kartik Audhkhasi

,

Brian Kingsbury

Proceedings of the IEEE International Conference on Acoustics, 2019

Simplified LSTMS for Speech Recognition.

[DOI]

,

,

Kartik Audhkhasi

,

Brian Kingsbury

,

Michael Picheny

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition.

[DOI]

Kartik Audhkhasi

,

Brian Kingsbury

,

Bhuvana Ramabhadran

,

,

Michael Picheny

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Recent advances in conversational speech recognition using convolutional and recurrent neural networks.

[DOI]

,

Michael Picheny

IBM J. Res. Dev., 2017

Recent progress in deep end-to-end models for spoken language processing.

[DOI]

Kartik Audhkhasi

,

Andrew Rosenberg

,

,

,

Bhuvana Ramabhadran

,

Stanley F. Chen

,

Michael Picheny

IBM J. Res. Dev., 2017

Accelerating deep neural network learning for speech recognition on a cluster of GPUs.

[DOI]

,

Brian Kingsbury

,

,

,

Proceedings of the Machine Learning on HPC Environments, 2017

English Conversational Telephone Speech Recognition by Humans and Machines.

[DOI]

,

,

,

Kartik Audhkhasi

,

,

Dimitrios Dimitriadis

,

,

Bhuvana Ramabhadran

,

Michael Picheny

,

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Empirical Exploration of Novel Architectures and Objectives for Language Models.

[DOI]

,

,

Bhuvana Ramabhadran

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Embedding-Based Speaker Adaptive Training of Deep Neural Networks.

[DOI]

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Direct Acoustics-to-Word Models for English Conversational Speech Recognition.

[DOI]

Kartik Audhkhasi

,

Bhuvana Ramabhadran

,

,

Michael Picheny

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Network architectures for multilingual speech representation learning.

[DOI]

,

,

,

,

Bhuvana Ramabhadran

,

Brian Kingsbury

,

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Knowledge distillation across ensembles of multilingual models for low-resource languages.

[DOI]

,

Brian Kingsbury

,

Bhuvana Ramabhadran

,

,

,

Kartik Audhkhasi

,

,

Markus Nußbaum-Thom

,

Andrew Rosenberg

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Language modeling with highway LSTM.

[DOI]

,

Bhuvana Ramabhadran

,

,

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings.

[DOI]

Masayuki Suzuki

,

Ryuki Tachibana

,

,

Bhuvana Ramabhadran

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The IBM 2016 English Conversational Telephone Speech Recognition System.

[DOI]

,

,

Steven J. Rennie

,

Hong-Kwang Jeff Kuo

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

On the importance of event detection for ASR.

[DOI]

,

Dimitrios Dimitriadis

,

,

,

Michael Picheny

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Deep Convolutional Neural Networks for Large-scale Speech Tasks.

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

,

,

Abdel-rahman Mohamed

,

,

Bhuvana Ramabhadran

Neural Networks, 2015

The IBM BOLT speech transcription system.

[DOI]

,

,

Hong-Kwang Jeff Kuo

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The IBM 2015 English conversational telephone speech recognition system.

[DOI]

,

Hong-Kwang Jeff Kuo

,

Steven J. Rennie

,

Michael Picheny

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A multi-region deep neural network model in speech recognition.

[DOI]

,

,

Bhuvana Ramabhadran

,

Brian Kingsbury

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Improvements to the IBM speech activity detection system for the DARPA RATS program.

[DOI]

,

,

Maarten Van Segbroeck

,

Shrikanth S. Narayanan

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Order-free spoken term detection.

[DOI]

,

,

Michael Picheny

,

Brian Kingsbury

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A nonmonotone learning rate strategy for SGD training of deep neural networks.

[DOI]

Nitish Shirish Keskar

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Automatic Speech Recognition.

[DOI]

,

,

,

,

Brian Kingsbury

,

,

Proceedings of the Natural Language Processing of Semitic Languages, 2014

A distributed architecture for fast SGD sequence discriminative training of DNN acoustic models.

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Unfolded recurrent neural networks for speech recognition.

[DOI]

,

,

,

Michael Picheny

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Parallel deep neural network training for LVCSR tasks using blue gene/Q.

[DOI]

Tara N. Sainath

,

,

Bhuvana Ramabhadran

,

Michael Picheny

,

John A. Gunnels

,

Brian Kingsbury

,

,

,

Upendra V. Chaudhari

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions.

[DOI]

,

Sriram Ganapathy

,

,

Proceedings of the IEEE International Conference on Acoustics, 2014

Joint training of convolutional and non-convolutional neural networks.

[DOI]

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2014

A comparison of two optimization techniques for sequence discriminative training of deep neural networks.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2014

Improvements to filterbank and delta learning within a deep neural network framework.

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

Abdel-rahman Mohamed

,

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Neural network acoustic models for the DARPA RATS program.

[DOI]

,

,

,

,

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

The IBM speech activity detection system for the DARPA RATS program.

[DOI]

,

,

,

Sriram Ganapathy

,

Brian Kingsbury

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Exploiting diversity for spoken term detection.

[DOI]

,

,

,

Brian Kingsbury

,

Proceedings of the IEEE International Conference on Acoustics, 2013

Speaker adaptation of neural network acoustic models using i-vectors.

[DOI]

,

,

,

Michael Picheny

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Improvements to Deep Convolutional Neural Networks for LVCSR.

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

Abdel-rahman Mohamed

,

,

,

,

,

Aleksandr Y. Aravkin

,

Bhuvana Ramabhadran

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

The IBM keyword search system for the DARPA RATS program.

[DOI]

,

,

,

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Bayesian Sensing Hidden Markov Models.

[DOI]

,

Jen-Tzung Chien

IEEE Trans. Speech Audio Process., 2012

Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances.

[DOI]

,

Jen-Tzung Chien

IEEE Signal Process. Mag., 2012

Boosting systems for large vocabulary continuous speech recognition.

[DOI]

,

Speech Commun., 2012

Discriminative feature-space transforms using deep neural networks.

[DOI]

,

Brian Kingsbury

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Sparse Bayesian Factor Analysis for Stereo-based Stochastic Mapping.

[DOI]

,

,

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Recent developments in large vocabulary continuous speech recognition.

[DOI]

,

Jen-Tzung Chien

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011

Trends and advances in speech recognition.

[DOI]

Michael Picheny

,

,

,

Brian Kingsbury

,

Bhuvana Ramabhadran

,

Steven J. Rennie

,

IBM J. Res. Dev., 2011

Discriminative training for Bayesian sensing hidden Markov models.

[DOI]

,

Jen-Tzung Chien

Proceedings of the IEEE International Conference on Acoustics, 2011

Bayesian sensing hidden Markov models for speech recognition.

[DOI]

,

Jen-Tzung Chien

Proceedings of the IEEE International Conference on Acoustics, 2011

The IBM 2009 GALE Arabic speech transcription system.

[DOI]

Brian Kingsbury

,

,

,

,

,

,

Suman V. Ravuri

,

,

Proceedings of the IEEE International Conference on Acoustics, 2011

Some properties of Bayesian sensing hidden Markov models.

[DOI]

,

Jen-Tzung Chien

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

The IBM 2011 GALE Arabic speech transcription system.

[DOI]

,

,

,

Brian Kingsbury

,

,

,

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Minimum Bayes risk discriminative language models for Arabic speech recognition.

[DOI]

Hong-Kwang Jeff Kuo

,

,

,

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

The IBM Attila speech recognition toolkit.

[DOI]

,

,

Brian Kingsbury

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Boosting systems for LVCSR.

[DOI]

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The IBM 2008 GALE Arabic speech transcription system.

[DOI]

,

,

Upendra V. Chaudhari

,

,

Brian Kingsbury

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program.

[DOI]

,

,

Brian Kingsbury

,

Hong-Kwang Jeff Kuo

,

,

,

IEEE Trans. Speech Audio Process., 2009

Large margin semi-tied covariance transforms for discriminative training.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2009

Dynamic network decoding revisited.

[DOI]

,

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

Penalty function maximization for large margin HMM training.

[DOI]

,

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Boosted MMI for model and feature-space discriminative training.

[DOI]

,

Dimitri Kanevsky

,

Brian Kingsbury

,

Bhuvana Ramabhadran

,

,

Karthik Visweswariah

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

The IBM 2006 Gale Arabic ASR System.

[DOI]

,

,

Brian Kingsbury

,

Hong-Kwang Jeff Kuo

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2007

Lattice-based Viterbi decoding techniques for speech translation.

[DOI]

,

Michael Picheny

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

Advances in speech transcription at IBM under the DARPA EARS program.

[DOI]

Stanley F. Chen

,

Brian Kingsbury

,

,

,

,

,

IEEE Trans. Speech Audio Process., 2006

On the Effect Ofword Error Rate on Automated Quality Monitoring.

[DOI]

,

Bhuvana Ramabhadran

,

Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

Automated Quality Monitoring for Call Centers using Speech and NLP Technologies.

[DOI]

,

,

,

Bhuvana Ramabhadran

,

,

,

Brian Kingsbury

Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

Feature and model space speaker adaptation with full covariance Gaussians.

[DOI]

,

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy.

[DOI]

,

,

,

Bhuvana Ramabhadran

,

,

,

Brian Kingsbury

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

A Non-Linear Speaker Adaptation Technique using Kernel Ridge Regression.

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Anatomy of an extremely fast LVCSR decoder.

[DOI]

,

,

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

The IBM 2004 Conversational Telephony System for Rich Transcription.

[DOI]

,

Brian Kingsbury

,

,

,

,

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

fMPE: Discriminatively Trained Features for Speech Recognition.

[DOI]

,

Brian Kingsbury

,

,

,

,

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Fractional Fourier transform features for speech recognition.

[DOI]

,

,

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Feature space Gaussianization.

[DOI]

,

Satya Dharanipragada

,

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

An architecture for rapid decoding of large vocabulary conversational speech.

[DOI]

,

,

Brian Kingsbury

,

,

Upendra V. Chaudhari

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Toward domain-independent conversational speech recognition.

[DOI]

Brian Kingsbury

,

,

,

,

,

,

Karthik Visweswariah

,

Michael Picheny

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002

Automatic speech recognition performance on a voicemail transcription task.

[DOI]

Mukund Padmanabhan

,

,

,

Brian Kingsbury

,

IEEE Trans. Speech Audio Process., 2002

Arc minimization in finite state decoding graphs with cross-word acoustic context.

[DOI]

,

,

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Improvements to the IBM Aurora 2 multi-condition system.

[DOI]

,

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Robust speech recognition in Noisy Environments: The 2001 IBM spine evaluation system.

[DOI]

Brian Kingsbury

,

,

,

Mukund Padmanabhan

,

Proceedings of the IEEE International Conference on Acoustics, 2002

Digit recognition in noisy environments via a sequential GMM/SVM system.

[DOI]

,

,

Ramesh A. Gopinath

Proceedings of the IEEE International Conference on Acoustics, 2002

2001

Data-driven approach to designing compound words for continuous speech recognition.

[DOI]

,

Mukund Padmanabhan

IEEE Trans. Speech Audio Process., 2001

Robust digit recognition in noisy environments: the IBM Aurora 2 system.

[DOI]

,

,

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Linear feature space projections for speaker adaptation.

[DOI]

,

,

Mukund Padmanabhan

Proceedings of the IEEE International Conference on Acoustics, 2001

Speech recognition for DARPA Communicator.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

2000

Minimum Bayes Error Feature Selection for Continuous Speech Recognition.

[DOI]

,

Mukund Padmanabhan

Proceedings of the Advances in Neural Information Processing Systems 13, 2000

Minimum Bayes error feature selection.

[DOI]

,

Mukund Padmanabhan

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Real-time multilingual HMM training robust to channel variations.

[DOI]

,

Jaime Botella Ordinas

,

,

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Recent improvements in speech recognition performance on large vocabulary conversational speech (voicemail and switchboard).

[DOI]

,

Brian Kingsbury

,

,

Mukund Padmanabhan

,

,

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Maximum likelihood discriminant feature spaces.

[DOI]

,

Mukund Padmanabhan

,

Ramesh A. Gopinath

,

Scott Saobing Chen

Proceedings of the IEEE International Conference on Acoustics, 2000

1999

Cursive word recognition using a random field based hidden Markov model.

[DOI]

Int. J. Document Anal. Recognit., 1999

Recent improvements in voicemail transcription.

[DOI]

Mukund Padmanabhan

,

,

,

,

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1997

Modèles markoviens uni- et bidimensionnels pour la reconnaissance de l'écriture manuscrite hors-ligne. (One and two-dimensional Markov models for off-line handwriting recognition).

[DOI]

PhD thesis, 1997

High Performance Unconstrained Word Recognition System Combining HMMs and Markov Random Fields.

[DOI]

,

Int. J. Pattern Recognit. Artif. Intell., 1997

Binary pattern recognition using Markov random fields and HMMs.

[DOI]

,

Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

1995

Stochastic trajectory modeling for recognition of unconstrained handwritten words.

[DOI]

,

,

Proceedings of the Third International Conference on Document Analysis and Recognition, 1995

1994

Off-line Handwriting Recognition by Statistical Correlation.

[DOI]

,

,

Proceedings of IAPR Workshop on Machine Vision Applications, 1994

Loading...