Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Anticipating Future with Large Language Model for Simultaneous Machine Translation.

[BibT_eX]

[DOI]

Siqi Ouyang

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

VoiceNoNG: Robust High-Quality Speech Editing Model without Hallucinations.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Word Level Timestamp Generation for Automatic Speech Recognition and Translation.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

EMMeTT: Efficient Multimodal Machine Translation Training.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Chain-of-Thought Prompting for Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Open Full-duplex Voice Agent with Speech-to-Speech Language Model.

[BibT_eX]

[DOI]

Seelan Lakshmi Narasimhan

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

2024

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.

[BibT_eX]

[DOI]

Fabian Ritter Gutierrez

CoRR, 2024

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings.

[BibT_eX]

[DOI]

CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines For Speech Recognition, Speaker Tagging, and Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Bestow: Efficient and Streamable Speech Language Model with The Best of Two Worlds in GPT and T5.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Instruction Data Generation and Unsupervised Adaptation for Speech Language Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Transducers with Pronunciation-Aware Embeddings for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.

[BibT_eX]

[DOI]

CoRR, 2023

Using Text Injection to Improve Recognition of Personal Identifiers in Speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Understanding Shared Speech-Text Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Accelerating RNN-T Training and Inference Using CTC Guidance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Accelerating RNN-T Training and Inference Using CTC guidance.

[BibT_eX]

[DOI]

CoRR, 2022

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data.

[BibT_eX]

[DOI]

CoRR, 2022

JOIST: A Joint Speech and Text Streaming Model for ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Unsupervised Data Selection via Discrete Speech Representation for ASR.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MAESTRO: Matched Speech Text Representations through Modality Matching.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Injecting Text in Self-Supervised Speech Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Incremental Lattice Determinization for WFST Decoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Sequence discriminative training for deep learning based acoustic keyword spotting.

[BibT_eX]

[DOI]

Zhehuai Chen

Yanmin Qian

Kai Yu

Speech Commun., 2018

Linguistic Search Optimization for Deep Learning Based LVCSR.

[BibT_eX]

[DOI]

Zhehuai Chen

CoRR, 2018

Knowledge Distillation for Sequence Model.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A GPU-based WFST Decoder with Exact Lattice Generation.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

On Modular Training of Neural Acoustics-to-Word Model for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequence Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.

[BibT_eX]

[DOI]

Zhehuai Chen

Jasha Droppo

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Phone Synchronous Speech Recognition With CTC Lattices.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Unified Confidence Measure Framework Using Auxiliary Normalization Graph.

[BibT_eX]

[DOI]

Zhehuai Chen

Yanmin Qian

Kai Yu

Proceedings of the Intelligence Science and Big Data Engineering, 2017

Confidence measures for CTC-based phone synchronous decoding.

[BibT_eX]

[DOI]

Zhehuai Chen

Yimeng Zhuang

Kai Yu

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2017

2016

Directed automatic speech transcription error correction using bidirectional LSTM.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Phone Synchronous Decoding with CTC Lattice.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

An investigation of context clustering for statistical speech synthesis with deep neural network.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Zhehuai Chen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...