Rohit Prabhavalkar

Orcid: 0000-0001-5331-6058

According to our database1, Rohit Prabhavalkar authored at least 80 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
End-to-End Speech Recognition: A Survey.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.
CoRR, 2024

2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models.
CoRR, 2023

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm.
CoRR, 2023

Massive End-to-end Models for Short Search Queries.
CoRR, 2023

Improving Joint Speech-Text Representations Without Alignment.
CoRR, 2023

How to Estimate Model Transferability of Pre-Trained Speech Models?
CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.
CoRR, 2023

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Contextual Biasing with Text Injection.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale.
Proceedings of the IEEE International Conference on Acoustics, 2023

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Cross-Training: A Semi-Supervised Training Scheme for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model.
Proceedings of the IEEE International Conference on Acoustics, 2023

Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections Through Federated Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Efficient Cascaded Streaming ASR System Via Frame Rate Reduction.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Improved Long-Form Speech Recognition By Jointly Modeling The Primary And Non-Primary Speakers.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
JOIST: A Joint Speech and Text Streaming Model for ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Dual Learning for Large Vocabulary On-Device ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Modular Hybrid Autoregressive Transducer.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Improving Rare Word Recognition with LM-aware MWER Training.
Proceedings of the Interspeech 2022, 2022

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR.
Proceedings of the Interspeech 2022, 2022

Improving Deliberation by Text-Only and Semi-Supervised Training.
Proceedings of the Interspeech 2022, 2022

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.
Proceedings of the Interspeech 2022, 2022


Neural-FST Class Language Model for End-to-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition.
CoRR, 2021

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Replacing Human Audio with Synthetic Audio for on-Device Unspoken Punctuation Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2021

Learning Word-Level Confidence for Subword End-To-End ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Less is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging.
Proceedings of the IEEE International Conference on Acoustics, 2021

Cascaded Encoders for Unifying Streaming and Non-Streaming ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer and Large Scale Synthetic Data.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency.
CoRR, 2020

Anti-Aliasing Regularization in Stacking Layers.
Proceedings of the Interspeech 2020, 2020


Deliberation Model Based Two-Pass End-To-End Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.
CoRR, 2019

Model Unit Exploration for Sequence-to-Sequence Speech Recognition.
CoRR, 2019


On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition.
Proceedings of the Interspeech 2019, 2019

Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models.
Proceedings of the Interspeech 2019, 2019


Joint Endpointing and Decoding with End-to-end Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Recognizing Long-Form Speech Using Streaming End-to-End Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Comparison of End-to-End Models for Long-Form Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Deep Context: End-to-end Contextual Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Compression of End-to-End Models.
Proceedings of the Interspeech 2018, 2018

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Improving the Performance of Online Neural Transducer Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
An Analysis of "Attention" in Sequence-to-Sequence Models.
Proceedings of the Interspeech 2017, 2017

A Comparison of Sequence-to-Sequence Models for Speech Recognition.
Proceedings of the Interspeech 2017, 2017

Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Streaming small-footprint keyword spotting using sequence-to-sequence models.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
On the Efficient Representation and Execution of Deep Acoustic Models.
Proceedings of the Interspeech 2016, 2016

On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Personalized speech recognition on mobile devices.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Compressing deep neural networks using a rank-constrained topology.
Proceedings of the INTERSPEECH 2015, 2015

Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2013
Conditional Random Fields in Speech, Audio, and Language Processing.
Proc. IEEE, 2013

An evaluation of posterior modeling techniques for phonetic recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Discriminative articulatory models for spoken term detection in low-resource conversational settings.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Discriminative spoken term detection with limited data.
Proceedings of the 2012 Symposium on Machine Learning in Speech and Language Processing, 2012

A chunk-based phonetic score for mobile voice search.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Articulatory Feature Classification Using Nearest Neighbors.
Proceedings of the INTERSPEECH 2011, 2011

A factored conditional random field model for articulatory feature forced transcription.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
Investigations into the Crandem Approach to Word Recognition.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2010

Combining monaural and binaural evidence for reverberant speech segregation.
Proceedings of the INTERSPEECH 2010, 2010

Backpropagation training for multilayer conditional random field based phone recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Monaural segregation of voiced speech using discriminative random fields.
Proceedings of the INTERSPEECH 2009, 2009


  Loading...