Zhehuai Chen

Orcid: 0000-0003-4400-5340

According to our database1, Zhehuai Chen authored at least 40 papers between 2015 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators.
CoRR, 2024

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings.
CoRR, 2024

2023
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation.
CoRR, 2023

Using Text Injection to Improve Recognition of Personal Identifiers in Speech.
CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.
CoRR, 2023

Understanding Shared Speech-Text Representations.
Proceedings of the IEEE International Conference on Acoustics, 2023

Accelerating RNN-T Training and Inference Using CTC Guidance.
Proceedings of the IEEE International Conference on Acoustics, 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Accelerating RNN-T Training and Inference Using CTC guidance.
CoRR, 2022

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data.
CoRR, 2022

JOIST: A Joint Speech and Text Streaming Model for ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Unsupervised Data Selection via Discrete Speech Representation for ASR.
Proceedings of the Interspeech 2022, 2022

MAESTRO: Matched Speech Text Representations through Modality Matching.
Proceedings of the Interspeech 2022, 2022

Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Injecting Text in Self-Supervised Speech Pretraining.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.
Proceedings of the Interspeech 2020, 2020

Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.
Proceedings of the Interspeech 2020, 2020

Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR.
Proceedings of the Interspeech 2019, 2019

End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder.
Proceedings of the IEEE International Conference on Acoustics, 2019

Incremental Lattice Determinization for WFST Decoders.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Sequence discriminative training for deep learning based acoustic keyword spotting.
Speech Commun., 2018

Linguistic Search Optimization for Deep Learning Based LVCSR.
CoRR, 2018

Knowledge Distillation for Sequence Model.
Proceedings of the Interspeech 2018, 2018

A GPU-based WFST Decoder with Exact Lattice Generation.
Proceedings of the Interspeech 2018, 2018

On Modular Training of Neural Acoustics-to-Word Model for LVCSR.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequence Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Phone Synchronous Speech Recognition With CTC Lattices.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Unified Confidence Measure Framework Using Auxiliary Normalization Graph.
Proceedings of the Intelligence Science and Big Data Engineering, 2017

Confidence measures for CTC-based phone synchronous decoding.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR.
Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2017

2016
Directed automatic speech transcription error correction using bidirectional LSTM.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Phone Synchronous Decoding with CTC Lattice.
Proceedings of the Interspeech 2016, 2016

2015
An investigation of context clustering for statistical speech synthesis with deep neural network.
Proceedings of the INTERSPEECH 2015, 2015


  Loading...