Andreas Stolcke

CoRR, March, 2026

Text-only adaptation in LLM-based ASR through text denoising.

[BibT_eX]

[DOI]

CoRR, January, 2026

Reducing Prompt Sensitivity in LLM-based Speech Recognition Through Learnable Projection.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

Slot Filling as a Reasoning Task for SpeechLLMs.

[BibT_eX]

[DOI]

Kadri Hacioglu

Manjunath K. E

CoRR, October, 2025

Unifying Streaming and Non-streaming Zipformer-based ASR.

[BibT_eX]

[DOI]

CoRR, June, 2025

Improving endpoint detection in end-to-end streaming ASR for conversational speech.

[BibT_eX]

[DOI]

S. Pavankumar Dubagunta

Shankar Venkatesan

Aravind Ganapathiraju

CoRR, May, 2025

Unifying Global and Near-Context Biasing in a Single Trie Pass.

[BibT_eX]

[DOI]

Iuliia Thorbecke

Esaú Villatoro-Tello

Juan Pablo Zuluaga-Gomez

Proceedings of the Text, Speech, and Dialogue - 28th International Conference, 2025

Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Performance Evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward.

[BibT_eX]

[DOI]

Aravind Ganapathiraju

Proceedings of the IEEE International Conference on Acoustics, 2025

Spoken Conversational Agents with Large Language Models.

[BibT_eX]

[DOI]

Huck Yang

Larry P. Heck

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling.

[BibT_eX]

[DOI]

Kadri Hacioglu

Manjunath K. E

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings.

[BibT_eX]

[DOI]

Aaron Zheng

Mansi Rana

Proceedings of the 31st International Conference on Computational Linguistics, 2025

TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Unifying Streaming and Non-streaming Zipformer-based ASR.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

2024

Improving speaker verification robustness with synthetic emotional utterances.

[BibT_eX]

[DOI]

Nikhil Kumar Koditala

CoRR, 2024

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines For Speech Recognition, Speaker Tagging, and Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Turn-Taking and Backchannel Prediction with Acoustic and Large Language Model Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue.

[BibT_eX]

[DOI]

Guan-Ting Lin

Proceedings of the IEEE International Conference on Acoustics, 2024

Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards ASR Robust Spoken Language Understanding Through in-Context Learning with Word Confusion Networks.

[BibT_eX]

[DOI]

Kevin Everson

Yile Gu

Chao-Han Huck Yang

Proceedings of the IEEE International Conference on Acoustics, 2024

Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

REFINE on Scarce Data: Retrieval Enhancement Through Fine-Tuning via Model Fusion of Embedding Models.

[BibT_eX]

[DOI]

Proceedings of the AI 2024: Advances in Artificial Intelligence, 2024

2023

Streaming Speech-to-Confusion Network Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Learning When to Trust Which Teacher for Weakly Supervised ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Cross-Utterance ASR Rescoring with Graph-Based Label Propagation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Procter: Pronunciation-Aware Contextual Adapter For Personalized Speech Recognition In Neural Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Adaptive Endpointing with Deep Contextual Multi-Armed Bandits.

[BibT_eX]

[DOI]

Viet Anh Trinh

Proceedings of the IEEE International Conference on Acoustics, 2023

Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition.

[BibT_eX]

[DOI]

Yu Yu

Chao-Han Huck Yang

Jari Kolehmainen

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Generative Speech Recognition Error Correction With Large Language Models and Task-Activating Prompting.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech.

[BibT_eX]

[DOI]

Roberto Barra-Chicote

CoRR, 2022

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Chao-Han Huck Yang

I-Fan Chen

Sabato Marco Siniscalchi

Chin-Hui Lee

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Adversarial Reweighting for Speaker Verification Fairness.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities.

[BibT_eX]

[DOI]

Pranav Dheram

Murugesan Ramakrishnan

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification.

[BibT_eX]

[DOI]

Long Chen

Yixiong Meng

Leibny Paola García-Perera

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Contrastive-mixup Learning for Improved Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Mitigating Closed-Model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

RescoreBERT: Discriminative Speech Recognition Rescoring With Bert.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Fairness in Speaker Verification via Group-Adapted Fusion Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

OpenFEAT: Improving Speaker Identification by Open-Set Few-Shot Embedding Adaptation with Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

ASR-Aware End-to-End Neural Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Self-Supervised Speaker Recognition Training using Human-Machine Dialogues.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021

DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs.

[BibT_eX]

[DOI]

Desh Raj

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

wav2vec-C: A Self-Supervised Model for Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Neural Diarization: From Transformer to Conformer.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Graph-Based Label Propagation for Semi-Supervised Speaker Identification.

[BibT_eX]

[DOI]

Long Chen

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

DO as I Mean, Not as I Say: Sequence Loss Training for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Joint ASR and Language Identification Using RNN-T: An Efficient Approach to Dynamic Language Switching.

[BibT_eX]

[DOI]

Athanasios Mouchtaris

Siegfried Kunzmann

Proceedings of the IEEE International Conference on Acoustics, 2021

Contrastive Unsupervised Learning for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Constantinos Papayiannis

Daniel Bone

Chao Wang

Proceedings of the IEEE International Conference on Acoustics, 2021

REDAT: Accent-Invariant Representation for End-To-End ASR by Domain Adversarial Training with Relabeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

BW-EDA-EEND: streaming END-TO-END Neural Speaker Diarization for a Variable Number of Speakers.

[BibT_eX]

[DOI]

Eunjung Han

Chul Lee

Proceedings of the IEEE International Conference on Acoustics, 2021

Personalization Strategies for End-to-End Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Attention-based Contextual Language Model Adaptation for Speech Recognition.

[BibT_eX]

[DOI]

Richard Diehl Martinez

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020

Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Speaker Identification for Household Scenarios with Self-Attention and Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Efficient Minimum Word Error Rate Training of RNN-Transducer for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Jinxi Guo

Gautam Tiwari

Jasha Droppo

Maarten Van Segbroeck

Che-Wei Huang

Roland Maas

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Combining Acoustics, Content and Interaction Features to Find Hot Spots in Meetings.

[BibT_eX]

[DOI]

Dave Makhervaks

William Hinthorn

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Meeting Transcription Using Virtual Microphone Arrays.

[BibT_eX]

[DOI]

Takuya Yoshioka

Zhuo Chen

CoRR, 2019

Meeting Transcription Using Asynchronous Distant Microphones.

[BibT_eX]

[DOI]

Takuya Yoshioka

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustic and Lexical Sentiment Analysis for Customer Service Calls.

[BibT_eX]

[DOI]

Bryan Li

Proceedings of the IEEE International Conference on Acoustics, 2019

Dover: A Method for Combining Diarization Outputs.

[BibT_eX]

[DOI]

Takuya Yoshioka

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Mispronunciation Detection in Children's Reading of Sentences.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

The Microsoft 2017 Conversational Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Session-level Language Modeling for Conversational Speech.

[BibT_eX]

[DOI]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

2017

Toward Human Parity in Conversational Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Automatic evaluation of reading aloud performance in children.

[BibT_eX]

[DOI]

Speech Commun., 2017

Comparing Human and Machine Errors in Conversational Speech Transcription.

[BibT_eX]

[DOI]

Jasha Droppo

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Automatic Evaluation of Children Reading Aloud on Sentences and Pseudowords.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Detection of Mispronunciations and Disfluencies in Children Reading Aloud.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Advances in all-neural speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

The microsoft 2016 conversational speech recognition system.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Achieving Human Parity in Conversational Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2016

Design and Analysis of a Database to Evaluate Children's Reading Aloud Performance.

[BibT_eX]

[DOI]

Proceedings of the Computational Processing of the Portuguese Language, 2016

Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A comparative study of recurrent neural network models for lexical domain classification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

A Study of Multimodal Addressee Detection in Human-Human-Computer Interaction.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2015

A comparison of neural network feature transforms for speaker diarization.

[BibT_eX]

[DOI]

Sree Harsha Yella

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Aligning meeting recordings via adaptive fingerprinting.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Recurrent neural network and LSTM models for lexical utterance classification.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Personalization of word-phrase-entity language models.

[BibT_eX]

[DOI]

Michael Levit

Sarangarajan Parthasarathy

R. Subba

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multimodal addressee detection in multiparty dialogue systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Token-level interpolation for class-based language models.

[BibT_eX]

[DOI]

Michael Levit

Sarangarajan Parthasarathy

Shuangyu Chang

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A comparative study of neural network models for lexical intent classification.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Deep bi-directional recurrent networks over spectral windows.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Artificial neural network features for speaker diarization.

[BibT_eX]

[DOI]

Sree Harsha Yella

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Neural network models for lexical addressee detection.

[BibT_eX]

[DOI]

Sarangarajan Parthasarathy

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Word-phrase-entity language models: getting more mileage out of n-grams.

[BibT_eX]

[DOI]

Michael Levit

Shuangyu Chang

Benoît Dumoulin

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

The Relation of Eye Gaze and Face Pose: Potential Impact on Speech Recognition.

[BibT_eX]

[DOI]

Dilek Hakkani-Tür

Proceedings of the 16th International Conference on Multimodal Interaction, 2014

Highly accurate phonetic segmentation using boundary correction models and system fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Gaze-enhanced speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

A Cross-language Study on Automatic Speech Disfluency Detection.

[BibT_eX]

[DOI]

Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Using Out-of-Domain Data for Lexical Addressee Detection in Human-Human-Computer Dialog.

[BibT_eX]

[DOI]

Heeyoung Lee

Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Automatic phonetic segmentation using boundary models.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Addressee detection for dialog systems using temporal and spectral dimensions of speaking style.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Articulatory trajectories for large-vocabulary speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Using multiple versions of speech input in phone recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Language Modeling of Nonverbal Vocalizations in Spontaneous Speech.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 15th International Conference, 2012

Effects of audio and ASR quality on cepstral and high-level speaker verification systems.

[BibT_eX]

[DOI]

Martin Graciarena

Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

ProTK: An Improved Prosody Toolkit.

[BibT_eX]

[DOI]

Jacob Okamoto

Serguei V. S. Pakhomov

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Speaker recognition with region-constrained MLLR transforms.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Constrained Cepstral Speaker Recognition Using Matched UBM and JFA Training.

[BibT_eX]

[DOI]

Michelle Hewlett Sanchez

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Effective Arabic Dialect Classification Using Diverse Phonotactic Models.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Making themost from multiple microphones in meeting recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Language-independent constrained cepstral features for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

The SRI NIST 2010 speaker recognition evaluation system.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Bird species recognition combining acoustic and sequence modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

The CALO Meeting Assistant System.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2010

Unsupervised domain adaptation with multiple acoustic models.

[BibT_eX]

[DOI]

Xin Lei

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Improving Language Recognition with Multilingual Phone Recognition and Speaker Adaptation Transforms.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Leveraging speaker diarization for meeting recognition from distant microphones.

[BibT_eX]

[DOI]

Gerald Friedland

David Imseng

Proceedings of the IEEE International Conference on Acoustics, 2010

Acoustic front-end optimization for bird species recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Improving robustness of MLLR adaptation with speaker-clustered regression class trees.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2009

Multifactor adaptation for Mandarin broadcast news and conversation speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Development of the 2008 SRI Mandarin speech-to-text system for broadcast news and conversation.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Feature-based and channel-based analyses of intrinsic variability in speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Exploiting user feedback for language model adaptation in meeting recognition.

[BibT_eX]

[DOI]

Gökhan Tür

Proceedings of the IEEE International Conference on Acoustics, 2009

Data-driven lexicon expansion for Mandarin broadcast news and conversation speech recognition.

[BibT_eX]

[DOI]

Xin Lei

Proceedings of the IEEE International Conference on Acoustics, 2009

THE SRI NIST 2008 speaker recognition evaluation system.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

The CALO meeting speech recognition and understanding system.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Efficient data selection for machine translation.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Recognizing Arabic speakers with English phones.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2008: The Speaker and Language Recognition Workshop, 2008

Detecting nonnative speech using speaker recognition approaches.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2008: The Speaker and Language Recognition Workshop, 2008

Development of the SRI/nightingale Arabic ASR system.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

The case for automatic higher-level features in forensic speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Nonparametric feature normalization for SVM-based speaker verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Open-vocabulary spoken term detection using graphone-based hybrid recognition systems.

[BibT_eX]

[DOI]

Murat Akbacak

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Morph-based speech recognition and modeling of out-of-vocabulary words across languages.

[BibT_eX]

[DOI]

ACM Trans. Speech Lang. Process., 2007

Web resources for language modeling in conversational speech recognition.

[BibT_eX]

[DOI]

ACM Trans. Speech Lang. Process., 2007

Speaker Recognition With Session Variability Normalization Based on MLLR Adaptation Transforms.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2007

Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages.

[BibT_eX]

[DOI]

Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

fMPE-MAP: improved discriminative adaptation for modeling new domains.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Integrating MAP, marginals, and unsupervised language model adaptation.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

The SRI/OGI 2006 spoken term detection system.

[BibT_eX]

[DOI]

Izhak Shafran

Murat Akbacak

Brian Roark

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Duration and pronunciation conditioned lexical modeling for speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Detecting deception using critical segments.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Combining Discriminative Feature, Transform, and Model Training for Large Vocabulary Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Unsupervised Languagemodel Adaptation for Meeting Recognition.

[BibT_eX]

[DOI]

Gökhan Tür

Proceedings of the IEEE International Conference on Acoustics, 2007

NAP and WCCN: Comparison of Approaches using MLLR-SVM Speaker Verification System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Noise Robust Speaker Identification for Spontaneous Arabic Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System.

[BibT_eX]

[DOI]

Proceedings of the Multimodal Technologies for Perception of Humans, 2007

Reranking machine translation hypotheses with structured and web-based language models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

SmartKom-English: From Robust Recognition to Felicitous Interaction.

[BibT_eX]

[DOI]

Proceedings of the SmartKom: Foundations of Multimodal Dialogue Systems, 2006

Recent innovations in speech-to-text transcription at SRI-ICSI-UW.

[BibT_eX]

[DOI]

Barry Y. Chen

Horacio Franco

IEEE Trans. Speech Audio Process., 2006

Enriching speech recognition with automatic detection of sentence boundaries and disfluencies.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2006

Editorial for computer speech and language.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2006

A study in machine learning from imbalanced data for sentence boundary detection in speech.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2006

Morphology-based language modeling for conversational Arabic speech recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2006

Detecting Categories in News Video Using Acoustic, Speech, and Image Features.

[BibT_eX]

[DOI]

Proceedings of the 2006 TREC Video Retrieval Evaluation, 2006

Improvements in MLLR-Transform-based Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2006: The Speaker and Language Recognition Workshop, 2006

Text Based Dialog Act Classification for Multiparty Meetings.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2006

The ICSI-SRI Spring 2006 Meeting Recognition System.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2006

Speaker clustered regression-class trees for MLLR adaptation.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Within-class covariance normalization for SVM-based speaker recognition.

[BibT_eX]

[DOI]

Andrew O. Hatch

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Improved speech activity detection using cross-channel features for recognition of multiparty meetings.

[BibT_eX]

[DOI]

Kofi Boakye

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings.

[BibT_eX]

[DOI]

Matthias Zimmermann

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition.

[BibT_eX]

[DOI]

Andrew O. Hatch

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

The Contribution of Cepstral and Stylistic Features to SRI's 2005 NIST Speaker Recognition Evaluation System.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Pushing the envelope - aside [speech recognition].

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2005

Modeling prosodic feature sequences for speaker recognition.

[BibT_eX]

[DOI]

Speech Commun., 2005

Toward Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2005

Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2005

Using MLP features in SRI's conversational speech recognition system.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Improved discriminative training using phone lattices.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Development of a conversational telephone speech recognizer for Levantine Arabic.

[BibT_eX]

[DOI]

Katrin Kirchhoff

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Does active learning help automatic dialog act tagging in meeting data?

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

MLLR transforms as features in speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Leveraging speaker-dependent variation of adaptation.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Comparing HMM, maximum entropy, and conditional random fields for disfluency detection.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Two experiments comparing reading with listening for human processing of conversational telephone speech.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Distinguishing deceptive from non-deceptive speech.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Human language technology: opportunities and challenges.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Structural metadata research in the EARS program.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

SRI's 2004 NIST Speaker Recognition Evaluation System.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Improved Phonetic Speaker Recognition Using Lattice Decoding.

[BibT_eX]

[DOI]

Andrew O. Hatch

Barbara Peskin

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Using Conditional Random Fields for Sentence Boundary Detection in Speech.

[BibT_eX]

[DOI]

Proceedings of the ACL 2005, 2005

2004

Modeling NERFs for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2004: The Speaker and Language Recognition Workshop, Toledo, Spain, May 31, 2004

Improving Automatic Sentence Boundary Detection with Confusion Networks.

[BibT_eX]

[DOI]

Proceedings of HLT-NAACL 2004: Short Papers, Boston, Massachusetts, USA, May 2-7, 2004, 2004

Tandem Connectionist Feature Extraction for Conversational Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2004

The 2004 ICSI-SRI-UW Meeting Recognition System.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2004

Progress on Mandarin conversational telephone speech recognition.

[BibT_eX]

[DOI]

Martin Graciarena

Man-Hung Siu

Yan Huang

Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

On using MLP features in LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Effective acoustic modeling for rate-of-speech variation in large vocabulary conversational speech recognition.

[BibT_eX]

[DOI]

Horacio Franco

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Morphology-based language modeling for arabic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

An efficient repair procedure for quick transcriptions.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

From switchboard to meetings: development of the 2004 ICSI-SRI-UW meeting recognition system.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

The ICSI-SRI-UW metadata extraction system.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Using machine learning to cope with imbalanced classes in natural speech: evidence from sentence boundary and disfluency detection.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

The use of a linguistically motivated language model in conversational speech recognition.

[BibT_eX]

[DOI]

Mary P. Harper

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Trapping conversational speech: extending TRAP/tandem approaches to conversational telephone speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Voicing feature integration in SRI's decipher LVCSR system.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Comparing and Combining Generative and Posterior Probability Models: Some Advances in Sentence Boundary Detection in Speech.

[BibT_eX]

[DOI]

Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing , 2004

2003

Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition.

[BibT_eX]

[DOI]

Horacio Franco

Speech Commun., 2003

Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures.

[BibT_eX]

[DOI]

Ivan Bulyko

Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003

"TalkPrinting": Improving Speaker Recognition by Modeling Stylistic Features.

[BibT_eX]

[DOI]

M. Kemal Sönmez

Proceedings of the Intelligence and Security Informatics, First NSF/NIJ Symposium, 2003

Automatic disfluency identification in conversational speech using multiple knowledge sources.

[BibT_eX]

[DOI]

Yang Liu

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Modeling duration patterns for speaker recognition.

[BibT_eX]

[DOI]

Harry Bratt

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

The robustness of an almost-parsing language model given errorful training data.

[BibT_eX]

[DOI]

Mary P. Harper

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Prosodic knowledge sources for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Training a prosody-based dialog act tagger from unlabeled data.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Meetings about meetings: research at ICSI on speech in multiparty conversations.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

The ICSI Meeting Corpus.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

A prosody-based approach to end-of-utterance detection that does not require speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Iterative Statistical Language Model Generation for Use with an Agent-Oriented Natural Language Interface.

[BibT_eX]

Proceedings of the Human-Computer Interaction: Universal Access in HCI: Inclusive Design in the Information Society, 2003

2002

Improved modeling and efficiency for automatic transcription of Broadcast News.

[BibT_eX]

[DOI]

Ananth Sankar

Fuliang Weng

Speech Commun., 2002

SRILM - an extensible language modeling toolkit.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Building an ASR system for noisy environments: SRI's 2001 SPINE evaluation system.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Is the speaker done yet? faster and more accurate end-of-utterance detection using prosody.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues.

[BibT_eX]

[DOI]

Don Baron

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Prosody-based automatic detection of annoyance and frustration in human-computer dialog.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001

Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation.

[BibT_eX]

[DOI]

Comput. Linguistics, 2001

The Meeting Project at ICSI.

[BibT_eX]

[DOI]

Proceedings of the First International Conference on Human Language Technology Research, 2001

Improved maximum mutual information estimation training of continuous density HMMs.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Observations on overlap: findings and implications for automatic processing of multi-party conversation.

[BibT_eX]

[DOI]

Don Baron

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000

Prosody-based automatic segmentation of speech into sentences and topics.

[BibT_eX]

[DOI]

Speech Commun., 2000

Finding consensus in speech recognition: word error minimization and other applications of confusion networks.

[BibT_eX]

[DOI]

Lidia Mangu

Eric Brill

Comput. Speech Lang., 2000

Entropy-based Pruning of Backoff Language Models

[BibT_eX]

[DOI]

CoRR, 2000

Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?

[BibT_eX]

[DOI]

CoRR, 2000

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

[BibT_eX]

[DOI]

CoRR, 2000

Dialog Act Modeling for Automatic Tagging and Recognition of Conversational Speech.

[BibT_eX]

[DOI]

Comput. Linguistics, 2000

1999

Modeling the prosody of hidden events for improved word recognition.

[BibT_eX]

[DOI]

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Finding consensus among words: lattice-based word error minimization.

[BibT_eX]

[DOI]

Lidia Mangu

Eric Brill

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Combining words and prosody for information extraction from speech.

[BibT_eX]

[DOI]

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1998

Efficient lattice representation and generation.

[BibT_eX]

[DOI]

Fuliang Weng

Ananth Sankar

Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Automatic detection of sentence boundaries and disfluencies based on recognized words.

[BibT_eX]

[DOI]

How far do speakers back up in repairs? a quantitatve model.

[BibT_eX]

[DOI]

1997

Linguistic Knowledge and Empirical Methods in Speech Recognition.

[BibT_eX]

[DOI]

AI Mag., 1997

A study of multilingual speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Explicit word error minimization in n-best list rescoring.

[BibT_eX]

[DOI]

Yochai Konig

Mitchel Weintraub

Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Modeling linguistic segment and turn boundaries for n-best rescoring of spontaneous speech.

[BibT_eX]

[DOI]

Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

A prosody only decision-tree model for disfluency detection.

[BibT_eX]

[DOI]

Rebecca A. Bates

Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Structure and performance of a dependency language model.

[BibT_eX]

[DOI]

Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Neural-network based measures of confidence for word recognition.

[BibT_eX]

[DOI]

Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

1996

L0 - The First Five Years of an Automated Language Acquisition Project.

[BibT_eX]

[DOI]

Artif. Intell. Rev., 1996

Automatic linguistic segmentation of conversational speech.

[BibT_eX]

[DOI]

Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Word predictability after hesitations: a corpus-based study.

[BibT_eX]

[DOI]

Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Statistical language modeling for speech disfluencies.

[BibT_eX]

[DOI]

Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1995

An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities.

[BibT_eX]

Comput. Linguistics, 1995

Partitioning Grammars and Composing Parsers.

[BibT_eX]

[DOI]

Fuliang Weng

Proceedings of the Fourth International Workshop on Parsing Technologies, 1995

Using a stochastic context-free grammar as a language model for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 1995 International Conference on Acoustics, 1995

1994

Best-first Model Merging for Hidden Markov Model Induction.

[BibT_eX]

[DOI]

Stephen M. Omohundro

CoRR, 1994

Multiple-pronunciation lexical modeling in a speaker independent speech understanding system.

[BibT_eX]

[DOI]

Chuck Wooters

Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

The berkeley restaurant project.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Inducing Probabilistic Grammars by Bayesian Model Merging.

[BibT_eX]

[DOI]

Stephen M. Omohundro

Proceedings of the Grammatical Inference and Applications, Second International Colloquium, 1994

Precise N-Gram Probabilities from Stochastic Context-Free Grammars.

[BibT_eX]

[DOI]

Jonathan Segal

Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, 1994

1992

Hidden Markov Model} Induction by Bayesian Model Merging.

[BibT_eX]

[DOI]

Stephen M. Omohundro

Proceedings of the Advances in Neural Information Processing Systems 5, [NIPS Conference, Denver, Colorado, USA, November 30, 1992

1990

Gapping and Frame Semantics: A fresh look from a cognitive perspective.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Computational Linguistics, 1990

1989

Unification as Constraint Satisfaction in Structured Connectionist Networks.

[BibT_eX]

[DOI]