We stand with Ukraine

We stand with Ukraine

Bo Li

Orcid: 0000-0002-6711-3603

Affiliations:

Google Inc., USA
National University of Singapore, Singapore (former)

According to our database¹, Bo Li authored at least 90 papers between 2010 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2024

Massive End-to-end Speech Recognition Models with Time Reduction.

[DOI]

,

Rohit Prabhavalkar

,

,

,

Dongseong Hwang

,

,

,

,

,

,

,

Chengjian Zheng

,

,

Tara N. Sainath

,

Pedro Moreno Mengibar

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models.

[DOI]

,

,

Tsendsuren Munkhdalai

,

Nikhil Siddhartha

,

,

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2024

USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models.

[DOI]

,

,

,

,

,

,

Rohit Prabhavalkar

,

,

Tara N. Sainath

,

,

,

Amir Yazdanbakhsh

,

Shivani Agrawal

Proceedings of the IEEE International Conference on Acoustics, 2024

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR.

[DOI]

,

,

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2024

Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation.

[DOI]

,

,

,

Chung-Cheng Chiu

,

,

,

Tara N. Sainath

,

Philip C. Woodland

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Massive End-to-end Models for Short Search Queries.

[DOI]

,

Rohit Prabhavalkar

,

Dongseong Hwang

,

,

,

,

,

,

,

,

,

,

Tara N. Sainath

,

Pedro Moreno Mengibar

CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.

[DOI]

CoRR, 2023

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference.

[DOI]

,

,

Siddhartha Brahma

,

,

,

,

,

Vincent Y. Zhao

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Modular Domain Adaptation for Conformer-Based Streaming ASR.

[DOI]

,

,

Dongseong Hwang

,

Tara N. Sainath

,

Pedro Moreno Mengibar

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Mixture-of-Expert Conformer for Streaming Multilingual ASR.

[DOI]

,

,

Tara N. Sainath

,

,

Françoise Beaufays

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

How to Estimate Model Transferability of Pre-Trained Speech Models?

[DOI]

,

Chao-Han Huck Yang

,

,

,

,

Shuo-Yiin Chang

,

Rohit Prabhavalkar

,

,

Tara N. Sainath

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

UML: A Universal Monolingual Output Layer For Multilingual Asr.

[DOI]

,

,

Tara N. Sainath

,

Trevor Strohman

,

Shuo-Yiin Chang

Proceedings of the IEEE International Conference on Acoustics, 2023

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.

[DOI]

Chao-Han Huck Yang

,

,

,

,

Tara N. Sainath

,

Sabato Marco Siniscalchi

,

Proceedings of the IEEE International Conference on Acoustics, 2023

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.

[DOI]

Chao-Han Huck Yang

,

,

,

,

Rohit Prabhavalkar

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2023

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.

[DOI]

,

,

Rohit Prabhavalkar

,

Tara N. Sainath

,

,

,

,

,

Andrew Rosenberg

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2023

Efficient Domain Adaptation for Speech Foundation Models.

[DOI]

,

Dongseong Hwang

,

,

,

,

Tara N. Sainath

,

,

,

,

Trevor Strohman

,

Françoise Beaufays

Proceedings of the IEEE International Conference on Acoustics, 2023

Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion.

[DOI]

,

,

,

Dongseong Hwang

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2023

Massively Multilingual Shallow Fusion with Large Language Models.

[DOI]

,

Tara N. Sainath

,

,

,

,

,

,

Rodrigo Cabrera

,

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2023

Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion.

[DOI]

Shuo-Yiin Chang

,

,

Tara N. Sainath

,

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text.

[DOI]

,

Tara N. Sainath

,

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition.

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning.

[DOI]

,

,

,

,

,

,

CoRR, 2022

JOIST: A Joint Speech and Text Streaming Model for ASR.

[DOI]

Tara N. Sainath

,

Rohit Prabhavalkar

,

,

,

,

,

,

,

Trevor Strohman

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System.

[DOI]

Sepand Mavandadi

,

,

,

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Scaling Up Deliberation For Multilingual ASR.

[DOI]

,

,

Tara N. Sainath

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems.

[DOI]

,

Shuo-Yiin Chang

,

,

Tara N. Sainath

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification.

[DOI]

,

,

Tara N. Sainath

,

Trevor Strohman

,

Sepand Mavandadi

,

Shuo-Yiin Chang

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Language Agnostic Multilingual Streaming On-Device ASR System.

[DOI]

,

Tara N. Sainath

,

,

Shuo-Yiin Chang

,

,

Trevor Strohman

,

,

,

,

,

,

Sameer Bidichandani

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Intended Query Detection using E2E Modeling for Continued Conversation.

[DOI]

Shuo-Yiin Chang

,

,

,

Tara N. Sainath

,

,

,

,

,

,

Trevor Strohman

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Turn-Taking Prediction for Natural Conversational Speech.

[DOI]

Shuo-Yiin Chang

,

,

Tara N. Sainath

,

,

Trevor Strohman

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving the Fusion of Acoustic and Text Representations in RNN-T.

[DOI]

,

,

,

Tara N. Sainath

,

Shuo-Yiin Chang

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving The Latency And Quality Of Cascaded Encoders.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Massively Multilingual ASR: A Lifelong Learning Solution.

[DOI]

,

,

,

Tara N. Sainath

,

Trevor Strohman

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Unsupervised and Supervised Training for Multilingual ASR.

[DOI]

,

,

,

,

Nikhil Siddhartha

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Scaling End-to-End Models for Large-Scale Multilingual ASR.

[DOI]

,

,

Tara N. Sainath

,

,

,

,

,

,

CoRR, 2021

An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.

[DOI]

Tara N. Sainath

,

,

,

,

,

,

,

,

,

Quoc-Nam Le-The

,

Shuo-Yiin Chang

,

,

,

,

Chung-Cheng Chiu

,

Diamantino Caseiro

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Residual Energy-Based Models for End-to-End Speech Recognition.

[DOI]

,

,

,

,

Philip C. Woodland

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.

[DOI]

,

,

,

Chung-Cheng Chiu

,

,

Tara N. Sainath

,

,

Proceedings of the 9th International Conference on Learning Representations, 2021

FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.

[DOI]

,

Chung-Cheng Chiu

,

,

Shuo-Yiin Chang

,

Tara N. Sainath

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Learning Word-Level Confidence for Subword End-To-End ASR.

[DOI]

,

,

,

,

,

,

Rohit Prabhavalkar

,

,

,

,

Tara N. Sainath

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.

[DOI]

,

,

,

,

,

Philip C. Woodland

,

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2021

A Better and Faster end-to-end Model for Streaming ASR.

[DOI]

,

,

,

Tara N. Sainath

,

Chung-Cheng Chiu

,

,

Shuo-Yiin Chang

,

,

,

,

,

,

,

Trevor Strohman

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Scaling End-to-End Models for Large-Scale Multilingual ASR.

[DOI]

,

,

Tara N. Sainath

,

,

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling.

[DOI]

,

,

,

Chung-Cheng Chiu

,

,

Tara N. Sainath

,

,

CoRR, 2020

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency.

[DOI]

CoRR, 2020

Improved Noisy Student Training for Automatic Speech Recognition.

[DOI]

,

,

,

,

Chung-Cheng Chiu

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Low Latency Speech Recognition Using End-to-End Prefetching.

[DOI]

Shuo-Yiin Chang

,

,

,

,

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multistate Encoding with End-To-End Speech RNN Transducer Network.

[DOI]

,

,

,

Petar S. Aleksic

,

Tara N. Sainath

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Specaugment on Large Scale Datasets.

[DOI]

,

,

Chung-Cheng Chiu

,

,

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Towards Fast and Accurate Streaming End-To-End ASR.

[DOI]

,

Shuo-Yiin Chang

,

Tara N. Sainath

,

,

,

Trevor Strohman

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Introduction to the Issue on Data Science: Machine Learning for Audio Signal Processing.

[DOI]

Hendrik Purwins

,

Bob L. T. Sturm

,

,

,

IEEE J. Sel. Top. Signal Process., 2019

Deep Learning for Audio Signal Processing.

[DOI]

Hendrik Purwins

,

,

Tuomas Virtanen

,

,

Shuo-Yiin Chang

,

Tara N. Sainath

IEEE J. Sel. Top. Signal Process., 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.

[DOI]

,

,

,

,

,

,

,

Tara N. Sainath

,

,

Chung-Cheng Chiu

,

,

,

,

Stella Laurenzo

,

,

,

Wolfgang Macherey

,

,

,

,

,

,

Rohit Prabhavalkar

,

,

,

,

,

,

Sébastien Jean

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Kuan-Chieh Wang

,

Ekaterina Gonina

,

,

,

,

,

,

,

,

,

George F. Foster

,

John Richardson

,

,

Antoine Bruguier

,

,

,

,

,

,

,

Vijayaditya Peddinti

,

,

Michiel Bacchiani

,

Thomas B. Jablin

,

Robert Suderman

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Dmitry Lepikhin

,

,

,

,

Shubham Toshniwal

,

,

Michael Nirschl

,

CoRR, 2019

Shallow-Fusion End-to-End Contextual Biasing.

[DOI]

,

Tara N. Sainath

,

,

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes.

[DOI]

,

,

Tara N. Sainath

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Improving CTC Using Stimulated Learning for Sequence Modeling.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Streaming End-to-end Speech Recognition for Mobile Devices.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Semi-supervised Training for End-to-end Models via Weak Distillation.

[DOI]

,

Tara N. Sainath

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

A Unified Endpointer Using Multitask and Multidomain Training.

[DOI]

Shuo-Yiin Chang

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition.

[DOI]

,

,

,

Anshuman Tripathi

,

,

Tara N. Sainath

,

,

,

Michiel Bacchiani

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multilingual Speech Recognition with a Single End-to-End Model.

[DOI]

Shubham Toshniwal

,

Tara N. Sainath

,

,

,

Pedro J. Moreno

,

Eugene Weinstein

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models.

[DOI]

Tara N. Sainath

,

Rohit Prabhavalkar

,

,

,

,

,

,

,

,

,

,

Chung-Cheng Chiu

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model.

[DOI]

,

Tara N. Sainath

,

,

Michiel Bacchiani

,

Eugene Weinstein

,

,

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition.

[DOI]

,

,

Rohit Prabhavalkar

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.

[DOI]

Chung-Cheng Chiu

,

Tara N. Sainath

,

,

Rohit Prabhavalkar

,

,

,

,

,

,

Ekaterina Gonina

,

,

,

,

Michiel Bacchiani

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Temporal Modeling Using Dilated Convolution and Gating for Voice-Activity-Detection.

[DOI]

Shuo-Yiin Chang

,

,

,

Tara N. Sainath

,

Anshuman Tripathi

,

Aäron van den Oord

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition.

[DOI]

Tara N. Sainath

,

,

Kevin W. Wilson

,

,

,

,

Michiel Bacchiani

,

,

Andrew W. Senior

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Media computing and applications for immersive communications: recent advances.

[DOI]

,

Janne Heikkilä

,

J. Ambient Intell. Humaniz. Comput., 2017

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model.

[DOI]

,

Tara N. Sainath

,

,

Michiel Bacchiani

,

Eugene Weinstein

,

,

,

,

CoRR, 2017

An Analysis of "Attention" in Sequence-to-Sequence Models.

[DOI]

Rohit Prabhavalkar

,

Tara N. Sainath

,

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A Comparison of Sequence-to-Sequence Models for Speech Recognition.

[DOI]

Rohit Prabhavalkar

,

,

Tara N. Sainath

,

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Acoustic Modeling for Google Home.

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Reducing the Computational Complexity of Two-Dimensional LSTMs.

[DOI]

,

Tara N. Sainath

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition.

[DOI]

Shuo-Yiin Chang

,

,

Tara N. Sainath

,

,

Carolina Parada

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Raw Multichannel Processing Using Deep Neural Networks.

[DOI]

Tara N. Sainath

,

,

Kevin W. Wilson

,

,

Michiel Bacchiani

,

,

,

,

Andrew W. Senior

,

,

,

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks.

[DOI]

Tara N. Sainath

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis.

[DOI]

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition.

[DOI]

,

Tara N. Sainath

,

,

Kevin W. Wilson

,

Michiel Bacchiani

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2014

A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks.

[DOI]

,

IEEE ACM Trans. Audio Speech Lang. Process., 2014

Modeling long temporal contexts for robust DNN-based speech recognition.

[DOI]

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An ideal hidden-activation mask for deep neural networks based noise-robust speech recognition.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition.

[DOI]

,

,

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Noise adaptive front-end normalization based on Vector Taylor Series for Deep Neural Networks in robust speech recognition.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2013

Improving robustness of deep neural networks via spectral masking for automatic speech recognition.

[DOI]

,

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech.

[DOI]

,

,

,

,

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012

A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition.

[DOI]

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Improving mandarin predictive text input by augmenting pinyin initials with speech and tonal information.

[DOI]

,

,

,

,

,

Proceedings of the International Conference on Multimodal Interaction, 2012

2010

Hidden logistic linear regression for support vector machine based phone verification.

[DOI]

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems.

[DOI]

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Loading...