Boris Ginsburg

According to our database1, Boris Ginsburg authored at least 71 papers between 2002 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition.
CoRR, 2023

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System.
CoRR, 2023

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation.
CoRR, 2023

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations.
CoRR, 2023

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation.
CoRR, 2023

A Chat About Boring Problems: Studying GPT-based text normalization.
CoRR, 2023

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition.
CoRR, 2023

Investigating End-to-End ASR Architectures for Long Form Audio Transcription.
CoRR, 2023

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling.
CoRR, 2023

Confidence-based Ensembles of End-to-End Speech Recognition Models.
CoRR, 2023

Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources.
CoRR, 2023

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings.
CoRR, 2023

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition.
CoRR, 2023

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator.
CoRR, 2023

Flexible Multichannel Speech Enhancement for Noise-Robust Frontend.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2023.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations.
Proceedings of the International Conference on Machine Learning, 2023

BigVGAN: A Universal Neural Vocoder with Large-Scale Training.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Conformer-Based Target-Speaker Automatic Speech Recognition For Single-Channel Audio.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Blank Transducers for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Powerful and Extensible WFST Framework for Rnn-Transducer Losses.
Proceedings of the IEEE International Conference on Acoustics, 2023

ACE-VC: Adaptive and Controllable Voice Conversion Using Explicitly Disentangled Self-Supervised Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2023

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Vani: Very-Lightweight Accent-Controllable TTS for Native And Non-Native Speakers With Identity Preservation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers.
CoRR, 2022

AmberNet: A Compact End-to-End Model for Spoken Language Identification.
CoRR, 2022

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

NeMo Open Source Speaker Diarization System.
Proceedings of the Interspeech 2022, 2022

Multi-scale Speaker Diarization with Dynamic Scale Weighting.
Proceedings of the Interspeech 2022, 2022

CTC Variations Through New WFST Topologies.
Proceedings of the Interspeech 2022, 2022

Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization.
Proceedings of the Interspeech 2022, 2022

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization.
Proceedings of the Interspeech 2022, 2022

Mixer-TTS: Non-Autoregressive, Fast and Compact Text-to-Speech Model Conditioned on Language Model Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2022

TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design.
J. Chem. Inf. Model., 2021

Adapting TTS models For New Speakers using Transfer Learning.
CoRR, 2021

A Unified Transformer-based Framework for Duplex Text Normalization.
CoRR, 2021

CarneliNet: Neural Mixture Model for Automatic Speech Recognition.
CoRR, 2021

SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services.
CoRR, 2021

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.
CoRR, 2021

NeMo Toolbox for Speech Dataset Construction.
CoRR, 2021

A Toolbox for Construction and Analysis of Speech Datasets.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

NeMo Inverse Text Normalization: From Development to Production.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

NeMo (Inverse) Text Normalization: From Development to Production.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

SPGISpeech: 5, 000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

TalkNet: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Hi-Fi Multi-Speaker English TTS Dataset.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition.
Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
On regularization of gradient descent, layer imbalance and flat minima.
CoRR, 2020

MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition.
Proceedings of the Interspeech 2020, 2020

Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
NeMo: a toolkit for building AI applications using Neural Modules.
CoRR, 2019

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks.
CoRR, 2019

Jasper: An End-to-End Convolutional Neural Acoustic Model.
Proceedings of the Interspeech 2019, 2019

2018
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation.
CoRR, 2018

OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models.
CoRR, 2018

Computational mammography using deep neural networks.
Comput. methods Biomech. Biomed. Eng. Imaging Vis., 2018

Mixed Precision Training.
Proceedings of the 6th International Conference on Learning Representations, 2018

Spatially Parallel Convolutions.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification.
CoRR, 2017

Scaling SGD Batch Size to 32K for ImageNet Training.
CoRR, 2017

Training Deep AutoEncoders for Collaborative Filtering.
CoRR, 2017

On Improving the Numerical Stability of Winograd Convolutions.
Proceedings of the 5th International Conference on Learning Representations, 2017

Factorization tricks for LSTM networks.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2002
The ForSpec Temporal Logic: A New Temporal Property-Specification Language.
Proceedings of the Tools and Algorithms for the Construction and Analysis of Systems, 2002


  Loading...