Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Unified Semi-Supervised Pipeline for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Nune Tadevosyan

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Granary: Speech Recognition and Translation Dataset in 25 European Languages.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

From Scarcity to Sufficiency: Speech Recognition Pipeline for Zero-resource Language.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Pushing the Limits of Beam Search Decoding for Transducer-based ASR models.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

EMMeTT: Efficient Multimodal Machine Translation Training.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Chain-of-Thought Prompting for Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Extending Automatic Machine Translation Evaluation to Book-Length Documents.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Training and Inference Efficiency of Encoder-Decoder Speech Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Open Full-duplex Voice Agent with Speech-to-Speech Language Model.

[BibT_eX]

[DOI]

Seelan Lakshmi Narasimhan

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Label-Looping: Highly Efficient Decoding For Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Chat about Boring Problems: Studying GPT-Based Text Normalization.

[BibT_eX]

[DOI]

Yang Zhang

Travis M. Bartley

Mariana Graterol-Fuenmayor

Vitaly Lavrukhin

Evelina Bakhturina

Boris Ginsburg

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

NeMo Forced Aligner and its application to word alignment for subtitle generation.

[BibT_eX]

[DOI]

Elena Rastorgueva

Vitaly Lavrukhin

Boris Ginsburg

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Confidence-based Ensembles of End-to-End Speech Recognition Models.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Conformer-Based Target-Speaker Automatic Speech Recognition For Single-Channel Audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

2021

NeMo Toolbox for Speech Dataset Construction.

[BibT_eX]

[DOI]

Evelina Bakhturina

Vitaly Lavrukhin

Boris Ginsburg

CoRR, 2021

A Toolbox for Construction and Analysis of Speech Datasets.

[BibT_eX]

[DOI]

Evelina Bakhturina

Vitaly Lavrukhin

Boris Ginsburg

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

SPGISpeech: 5, 000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Hi-Fi Multi-Speaker English TTS Dataset.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

2020

Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

NeMo: a toolkit for building AI applications using Neural Modules.

[BibT_eX]

[DOI]

CoRR, 2019

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Jasper: An End-to-End Convolutional Neural Acoustic Model.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Training Neural Speech Recognition Systems with Synthetic Speech Augmentation.

[BibT_eX]

[DOI]

CoRR, 2018

OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models.

[BibT_eX]

[DOI]

CoRR, 2018

Vitaly Lavrukhin

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...