We stand with Ukraine

We stand with Ukraine

Zhong Meng

Orcid: 0000-0001-7814-5929

According to our database¹, Zhong Meng authored at least 71 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables.

[DOI]

,

,

,

,

,

,

,

,

Alexandre Mourachko

CoRR, May, 2026

2024

Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR.

[DOI]

,

,

,

,

,

,

,

Diamantino Caseiro

,

,

Tsendsuren Munkhdalai

,

Angad Chandorkar

,

Rohit Prabhavalkar

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, 2024

Massive End-to-end Speech Recognition Models with Time Reduction.

[DOI]

,

Rohit Prabhavalkar

,

,

,

Dongseong Hwang

,

,

,

,

,

,

,

Chengjian Zheng

,

,

Tara N. Sainath

,

Pedro Moreno Mengibar

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm.

[DOI]

,

,

Diamantino Caseiro

,

Tsendsuren Munkhdalai

,

,

,

,

,

Rohit Prabhavalkar

,

,

,

,

,

Pedro Moreno Mengibar

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Text Injection for Neural Contextual Biasing.

[DOI]

,

,

Rohit Prabhavalkar

,

,

,

,

Tara N. Sainath

,

Bhuvana Ramabhadran

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions.

[DOI]

Murali Karthick Baskar

,

Andrew Rosenberg

,

Bhuvana Ramabhadran

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping.

[DOI]

,

,

,

,

Rohit Prabhavalkar

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models.

[DOI]

,

,

Tsendsuren Munkhdalai

,

Nikhil Siddhartha

,

,

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2024

Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition.

[DOI]

,

,

,

,

Krzysztof Choromanski

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2024

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.

[DOI]

Rohit Prabhavalkar

,

,

,

,

,

,

,

Dongseong Hwang

,

Tara N. Sainath

,

Pedro J. Moreno

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

SLM: Bridge the thin gap between speech and text foundation models.

[DOI]

,

,

,

,

Chung-Cheng Chiu

,

,

,

,

,

,

Paul K. Rubenstein

,

,

,

,

,

Nikhil Siddhartha

,

Johan Schalkwyk

,

CoRR, 2023

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm.

[DOI]

,

,

Diamantino Caseiro

,

Tsendsuren Munkhdalai

,

,

,

,

,

Rohit Prabhavalkar

,

,

,

Tara N. Sainath

,

Pedro Moreno Mengibar

CoRR, 2023

Massive End-to-end Models for Short Search Queries.

[DOI]

,

Rohit Prabhavalkar

,

Dongseong Hwang

,

,

,

,

,

,

,

,

,

,

Tara N. Sainath

,

Pedro Moreno Mengibar

CoRR, 2023

Augmenting conformers with structured state space models for online speech recognition.

[DOI]

,

,

,

,

Krzysztof Choromanski

,

Tara N. Sainath

CoRR, 2023

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models.

[DOI]

,

Shuo-Yiin Chang

,

,

,

,

Tara N. Sainath

CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.

[DOI]

CoRR, 2023

Improving Joint Speech-Text Representations Without Alignment.

[DOI]

,

,

Rohit Prabhavalkar

,

Andrew Rosenberg

,

Tara N. Sainath

,

Michael Picheny

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models.

[DOI]

,

Shuo-Yiin Chang

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.

[DOI]

,

,

Rohit Prabhavalkar

,

Tara N. Sainath

,

,

,

,

,

Andrew Rosenberg

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Modular Hybrid Autoregressive Transducer.

[DOI]

,

,

Rohit Prabhavalkar

,

,

,

Kartik Audhkhasi

,

,

Trevor Strohman

,

Bhuvana Ramabhadran

,

,

,

,

Pedro J. Moreno

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.

[DOI]

,

,

,

,

,

Sefik Emre Eskimez

,

Takuya Yoshioka

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.

[DOI]

,

,

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

[DOI]

,

,

,

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

[DOI]

,

,

,

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Continuous Speech Separation with Recurrent Selective Attention Network.

[DOI]

,

,

,

Takuya Yoshioka

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.

[DOI]

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE International Conference on Acoustics, 2022

Factorized Neural Transducer for Efficient Language Model Adaptation.

[DOI]

,

,

Sarangarajan Parthasarathy

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription.

[DOI]

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.

[DOI]

,

Sarangarajan Parthasarathy

,

,

,

,

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.

[DOI]

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.

[DOI]

,

,

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.

[DOI]

,

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speaker-Attributed ASR with Transformer.

[DOI]

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.

[DOI]

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Sequence-Level Self-Teaching Regularization.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.

[DOI]

,

,

,

Sarangarajan Parthasarathy

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

[DOI]

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE International Conference on Acoustics, 2021

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings.

[DOI]

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Ad Hoc Microphone Arrays.

[DOI]

,

Takuya Yoshioka

,

,

,

,

Proceedings of the 29th European Signal Processing Conference, 2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio.

[DOI]

,

,

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Active voice authentication.

[DOI]

,

Muhammad Umair Bin Altaf

,

Biing-Hwang Fred Juang

Digit. Signal Process., 2020

Continuous speech separation: dataset and analysis.

[DOI]

,

Takuya Yoshioka

,

,

,

,

,

,

CoRR, 2020

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability.

[DOI]

,

,

,

,

,

Sarangarajan Parthasarathy

,

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Serialized Output Training for End-to-End Overlapped Speech Recognition.

[DOI]

,

,

,

,

Takuya Yoshioka

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.

[DOI]

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

L-Vector: Neural Label Embedding for Domain Adaptation.

[DOI]

,

,

,

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model.

[DOI]

,

,

,

Jeremy Heng Meng Wong

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Continuous Speech Separation: Dataset and Analysis.

[DOI]

,

Takuya Yoshioka

,

,

,

,

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Speaker Adaptation for Attention-Based End-to-End Speech Recognition.

[DOI]

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustic-to-Phrase Models for Speech Recognition.

[DOI]

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Adversarial Speaker Verification.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Conditional Teacher-student Learning.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Attentive Adversarial Learning for Domain-invariant Training.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Adversarial Speaker Adaptation.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Speech Separation Using Speaker Inventory.

[DOI]

,

,

,

,

Takuya Yoshioka

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition.

[DOI]

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Character-Aware Attention-Based End-to-End Speech Recognition.

[DOI]

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Discriminative and adaptive training for robust speech recognition and understanding.

[DOI]

PhD thesis, 2018

Speaker-Invariant Training via Adversarial Learning.

[DOI]

,

,

,

,

,

,

Biing-Hwang Juang

CoRR, 2018

Adversarial Feature-Mapping for Speech Enhancement.

[DOI]

,

,

,

Biing-Hwang Fred Juang

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Cycle-Consistent Speech Enhancement.

[DOI]

,

,

,

Biing-Hwang Fred Juang

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Adversarial Teacher-Student Learning for Unsupervised Domain Adaptation.

[DOI]

,

,

,

Biing-Hwang Juang

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speaker-Invariant Training Via Adversarial Learning.

[DOI]

,

,

,

,

,

,

Biing-Hwang Juang

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting.

[DOI]

,

Biing-Hwang Juang

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Minimum Semantic Error Cost Training of Deep Long Short-Term Memory Networks for Topic Spotting on Conversational Speech.

[DOI]

,

Biing-Hwang Juang

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition.

[DOI]

,

Shinji Watanabe

,

John R. Hershey

,

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Unsupervised adaptation with domain separation networks for robust speech recognition.

[DOI]

,

,

,

,

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

Statistical Modeling of Speaker's Voice with Temporal Co-Location for Active Voice Authentication.

[DOI]

,

Biing-Hwang Juang

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting.

[DOI]

,

Biing-Hwang Juang

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Loading...