Boris Ginsburg

Seelan Lakshmi Narasimhan

Proceedings of the Forty-second International Conference on Machine Learning, 2025

HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

nGPT: Normalized Transformer with Representation Learning on the Hypersphere.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

EMMeTT: Efficient Multimodal Machine Translation Training.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Chain-of-Thought Prompting for Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Extending Automatic Machine Translation Evaluation to Book-Length Documents.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

SWAN: An Efficient and Scalable Approach for Long-Context Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Training and Inference Efficiency of Encoder-Decoder Speech Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Open Full-duplex Voice Agent with Speech-to-Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

2024

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR.

[BibT_eX]

[DOI]

CoRR, 2024

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

RULER: What's the Real Context Size of Your Long-Context Language Models?

[BibT_eX]

[DOI]

CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines For Speech Recognition, Speaker Tagging, and Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Romanization Encoding For Multilingual ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Bestow: Efficient and Streamable Speech Language Model with The Best of Two Worlds in GPT and T5.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Label-Looping: Highly Efficient Decoding For Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Instruction Data Generation and Unsupervised Adaptation for Speech Language Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Schrödinger Bridge for Generative Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations.

[BibT_eX]

[DOI]

Paarth Neekhara

Shehzeen Samarah Hussain

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Transducers with Pronunciation-Aware Embeddings for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Stateful Conformer with Cache-Based Inference for Streaming Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Investigating End-to-End ASR Architectures for Long Form Audio Transcription.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

A Chat about Boring Problems: Studying GPT-Based Text Normalization.

[BibT_eX]

[DOI]

Yang Zhang

Travis M. Bartley

Mariana Graterol-Fuenmayor

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System.

[BibT_eX]

[DOI]

CoRR, 2023

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources.

[BibT_eX]

[DOI]

Kunal Dhawan

Dima Rekesh

CoRR, 2023

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

Flexible Multichannel Speech Enhancement for Noise-Robust Frontend.

[BibT_eX]

[DOI]

Ante Jukic

Jagadeesh Balam

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2023.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

NeMo Forced Aligner and its application to word alignment for subtitle generation.

[BibT_eX]

[DOI]

Elena Rastorgueva

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Compact End-to-End Model with Local and Global Context for Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling.

[BibT_eX]

[DOI]

He Huang

Jagadeesh Balam

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapter-Based Extension of Multi-Speaker Text-To-Speech Model for New Speakers.

[BibT_eX]

[DOI]

Cheng-Ping Hsieh

Subhankar Ghosh

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Confidence-based Ensembles of End-to-End Speech Recognition Models.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings.

[BibT_eX]

[DOI]

Alexandra Antonova

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

BigVGAN: A Universal Neural Vocoder with Large-Scale Training.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Conformer-Based Target-Speaker Automatic Speech Recognition For Single-Channel Audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Blank Transducers for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Powerful and Extensible WFST Framework for Rnn-Transducer Losses.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

ACE-VC: Adaptive and Controllable Voice Conversion Using Explicitly Disentangled Self-Supervised Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Vani: Very-Lightweight Accent-Controllable TTS for Native And Non-Native Speakers With Identity Preservation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

AmberNet: A Compact End-to-End Model for Spoken Language Identification.

[BibT_eX]

[DOI]

CoRR, 2022

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition.

[BibT_eX]

[DOI]

Aleksandr Laptev

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

NeMo Open Source Speaker Diarization System.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-scale Speaker Diarization with Dynamic Scale Weighting.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CTC Variations Through New WFST Topologies.

[BibT_eX]

[DOI]

Aleksandr Laptev

Somshubra Majumdar

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization.

[BibT_eX]

[DOI]

Yang Zhang

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization.

[BibT_eX]

[DOI]

Alexandra Antonova

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Mixer-TTS: Non-Autoregressive, Fast and Compact Text-to-Speech Model Conditioned on Language Model Embeddings.

[BibT_eX]

[DOI]

Oktai Tatanov

Stanislav Beliaev

Proceedings of the IEEE International Conference on Acoustics, 2022

TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context.

[BibT_eX]

[DOI]

Nithin Rao Koluguri

Taejin Park

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design.

[BibT_eX]

[DOI]

J. Chem. Inf. Model., 2021

Adapting TTS models For New Speakers using Transfer Learning.

[BibT_eX]

[DOI]

Paarth Neekhara

Jason Li

CoRR, 2021

A Unified Transformer-based Framework for Duplex Text Normalization.

[BibT_eX]

[DOI]

CoRR, 2021

CarneliNet: Neural Mixture Model for Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services.

[BibT_eX]

[DOI]

CoRR, 2021

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

[BibT_eX]

[DOI]

Stanislav Beliaev

CoRR, 2021

NeMo Toolbox for Speech Dataset Construction.

[BibT_eX]

[DOI]

CoRR, 2021

A Toolbox for Construction and Analysis of Speech Datasets.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

NeMo Inverse Text Normalization: From Development to Production.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

NeMo (Inverse) Text Normalization: From Development to Production.

[BibT_eX]

[DOI]

Yang Zhang

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

SPGISpeech: 5, 000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

TalkNet: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis.

[BibT_eX]

[DOI]

Stanislav Beliaev

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Hi-Fi Multi-Speaker English TTS Dataset.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection.

[BibT_eX]

[DOI]

Fei Jia

Somshubra Majumdar

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

On regularization of gradient descent, layer imbalance and flat minima.

[BibT_eX]

[DOI]

CoRR, 2020

MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition.

[BibT_eX]

[DOI]

Somshubra Majumdar

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model.

[BibT_eX]

[DOI]

Oleksii Hrinchuk

Mariya Popova

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

NeMo: a toolkit for building AI applications using Neural Modules.

[BibT_eX]

[DOI]

CoRR, 2019

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Jasper: An End-to-End Convolutional Neural Acoustic Model.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Training Neural Speech Recognition Systems with Synthetic Speech Augmentation.

[BibT_eX]

[DOI]

CoRR, 2018

OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models.

[BibT_eX]

[DOI]

CoRR, 2018

Computational mammography using deep neural networks.

[BibT_eX]

[DOI]

Comput. methods Biomech. Biomed. Eng. Imaging Vis., 2018

Mixed Precision Training.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

Spatially Parallel Convolutions.

[BibT_eX]

[DOI]

Peter H. Jin

Kurt Keutzer

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification.

[BibT_eX]

[DOI]

Igor Gitman

CoRR, 2017

Scaling SGD Batch Size to 32K for ImageNet Training.

[BibT_eX]

[DOI]

Yang You

Igor Gitman

CoRR, 2017

Training Deep AutoEncoders for Collaborative Filtering.

[BibT_eX]

[DOI]

Oleksii Kuchaiev

CoRR, 2017

On Improving the Numerical Stability of Winograd Convolutions.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

Factorization tricks for LSTM networks.

[BibT_eX]

[DOI]

Oleksii Kuchaiev