Bo Li

Orcid: 0000-0002-6711-3603

Affiliations:
  • Google Inc., USA
  • National University of Singapore, Singapore (former)


According to our database1, Bo Li authored at least 83 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation.
CoRR, 2024

2023
Massive End-to-end Models for Short Search Queries.
CoRR, 2023

How to Estimate Model Transferability of Pre-Trained Speech Models?
CoRR, 2023

Modular Domain Adaptation for Conformer-Based Streaming ASR.
CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.
CoRR, 2023

UML: A Universal Monolingual Output Layer For Multilingual Asr.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Efficient Domain Adaptation for Speech Foundation Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2023

Massively Multilingual Shallow Fusion with Large Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2022

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning.
CoRR, 2022

JOIST: A Joint Speech and Text Streaming Model for ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Scaling Up Deliberation For Multilingual ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification.
Proceedings of the Interspeech 2022, 2022

A Language Agnostic Multilingual Streaming On-Device ASR System.
Proceedings of the Interspeech 2022, 2022

Streaming Intended Query Detection using E2E Modeling for Continued Conversation.
Proceedings of the Interspeech 2022, 2022

Turn-Taking Prediction for Natural Conversational Speech.
Proceedings of the Interspeech 2022, 2022

Improving the Fusion of Acoustic and Text Representations in RNN-T.
Proceedings of the IEEE International Conference on Acoustics, 2022


Massively Multilingual ASR: A Lifelong Learning Solution.
Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Unsupervised and Supervised Training for Multilingual ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Scaling End-to-End Models for Large-Scale Multilingual ASR.
CoRR, 2021

An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Residual Energy-Based Models for End-to-End Speech Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.
Proceedings of the 9th International Conference on Learning Representations, 2021

FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Proceedings of the IEEE International Conference on Acoustics, 2021

Learning Word-Level Confidence for Subword End-To-End ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Better and Faster end-to-end Model for Streaming ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Scaling End-to-End Models for Large-Scale Multilingual ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling.
CoRR, 2020

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency.
CoRR, 2020

Improved Noisy Student Training for Automatic Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Low Latency Speech Recognition Using End-to-End Prefetching.
Proceedings of the Interspeech 2020, 2020

Multistate Encoding with End-To-End Speech RNN Transducer Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020


Specaugment on Large Scale Datasets.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Towards Fast and Accurate Streaming End-To-End ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Introduction to the Issue on Data Science: Machine Learning for Audio Signal Processing.
IEEE J. Sel. Top. Signal Process., 2019

Deep Learning for Audio Signal Processing.
IEEE J. Sel. Top. Signal Process., 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.
CoRR, 2019

Shallow-Fusion End-to-End Contextual Biasing.
Proceedings of the Interspeech 2019, 2019

Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes.
Proceedings of the IEEE International Conference on Acoustics, 2019

Improving CTC Using Stimulated Learning for Sequence Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2019


Semi-supervised Training for End-to-end Models via Weak Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Unified Endpointer Using Multitask and Multidomain Training.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition.
Proceedings of the Interspeech 2018, 2018

Multilingual Speech Recognition with a Single End-to-End Model.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Temporal Modeling Using Dilated Convolution and Gating for Voice-Activity-Detection.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Media computing and applications for immersive communications: recent advances.
J. Ambient Intell. Humaniz. Comput., 2017

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model.
CoRR, 2017

An Analysis of "Attention" in Sequence-to-Sequence Models.
Proceedings of the Interspeech 2017, 2017

A Comparison of Sequence-to-Sequence Models for Speech Recognition.
Proceedings of the Interspeech 2017, 2017


Reducing the Computational Complexity of Two-Dimensional LSTMs.
Proceedings of the Interspeech 2017, 2017

Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition.
Proceedings of the Interspeech 2017, 2017

Raw Multichannel Processing Using Deep Neural Networks.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks.
Proceedings of the Interspeech 2016, 2016

Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition.
Proceedings of the Interspeech 2016, 2016

2014
A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Modeling long temporal contexts for robust DNN-based speech recognition.
Proceedings of the INTERSPEECH 2014, 2014

An ideal hidden-activation mask for deep neural networks based noise-robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition.
Proceedings of the INTERSPEECH 2013, 2013

Noise adaptive front-end normalization based on Vector Taylor Series for Deep Neural Networks in robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Improving robustness of deep neural networks via spectral masking for automatic speech recognition.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012
A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition.
Proceedings of the INTERSPEECH 2012, 2012

Improving mandarin predictive text input by augmenting pinyin initials with speech and tonal information.
Proceedings of the International Conference on Multimodal Interaction, 2012

2010
Hidden logistic linear regression for support vector machine based phone verification.
Proceedings of the INTERSPEECH 2010, 2010

Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems.
Proceedings of the INTERSPEECH 2010, 2010


  Loading...