Srinivasan Umesh

Orcid: 0000-0002-5957-1444

According to our database1, Srinivasan Umesh authored at least 109 papers between 1992 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
On the relationship between speech and hearing.
CoRR, 2024

2023
FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning.
CoRR, 2023

Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition.
CoRR, 2023

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis.
CoRR, 2023

The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR.
CoRR, 2023

Channel-Aware Pretraining Of Joint Encoder-Decoder Self-Supervised Model For Telephonic-Speech ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

SLICER: Learning Universal Audio Representations Using Low-Resource Self-Supervised Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2023

Unfused: Unsupervised Finetuning Using Self Supervised Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Data2vec-Aqc: Search for the Right Teaching Assistant in the Teacher-Student Training Setup.
Proceedings of the IEEE International Conference on Acoustics, 2023

MAST: Multiscale Audio Spectrogram Transformers.
Proceedings of the IEEE International Conference on Acoustics, 2023

Towards Developing State-of-The-Art TTS Synthesisers for 13 Indian Languages with Signal Processing Aided Alignments.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Decorrelating Feature Spaces for Learning General-Purpose Audio Representations.
IEEE J. Sel. Top. Signal Process., 2022

Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR.
CoRR, 2022

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages.
CoRR, 2022

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition.
CoRR, 2022

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations.
CoRR, 2022

A Discourse Aware Sequence Learning Approach for Emotion Recognition in Conversations.
CoRR, 2022

MMER: Multimodal Multi-task learning for Emotion Recognition in Spoken Utterances.
CoRR, 2022

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning.
CoRR, 2022

Domain Adaptation of Low-Resource Target-Domain Models Using Well-Trained ASR Conformer Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CCC-WAV2VEC 2.0: Clustering AIDED Cross Contrastive Self-Supervised Learning of Speech Representations.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances.
Proceedings of the Interspeech 2022, 2022

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances.
Proceedings of the Interspeech 2022, 2022

Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi.
Proceedings of the Interspeech 2022, 2022

Joint Encoder-Decoder Self-Supervised Pre-training for ASR.
Proceedings of the Interspeech 2022, 2022

Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition.
Proceedings of the Interspeech 2022, 2022

Investigation of Robustness of Hubert Features from Different Layers to Domain, Accent and Language Variations.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Deep Clustering For General-Purpose Audio Representations.
CoRR, 2021

Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
S-vectors: Speaker Embeddings based on Transformer's Encoder for Text-Independent Speaker Verification.
CoRR, 2020

Investigation of Speaker-adaptation methods in Transformer based ASR.
CoRR, 2020

Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Investigation of Methods to Improve the Recognition Performance of Tamil-English Code-Switched Data in Transformer Framework.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2018
FMLLR Speaker Normalization With i-Vector: In Pseudo-FMLLR and Distillation Framework.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigating the Effect of Audio Duration on Dementia Detection Using Acoustic Features.
Proceedings of the Interspeech 2018, 2018

Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition.
Proceedings of the Interspeech 2018, 2018

Correlational Networks for Speaker Normalization in Automatic Speech Recognition.
Proceedings of the Interspeech 2018, 2018

2017
DNNs for unsupervised extraction of pseudo speaker-normalized features without explicit adaptation data.
Speech Commun., 2017

An automated technique to generate phone-to-articulatory label mapping.
Speech Commun., 2017

Addressing data sparsity in DNN acoustic modeling.
Proceedings of the Twenty-third National Conference on Communications, 2017

DNN acoustic models for dysarthric speech.
Proceedings of the Twenty-third National Conference on Communications, 2017

On Improving Acoustic Models for TORGO Dysarthric Speech Database.
Proceedings of the Interspeech 2017, 2017

Generalized Distillation Framework for Speaker Normalization.
Proceedings of the Interspeech 2017, 2017

Joint Estimation of Articulatory Features and Acoustic Models for Low-Resource Languages.
Proceedings of the Interspeech 2017, 2017

Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages.
Proceedings of the Interspeech 2017, 2017

2016
Modified Mean and Variance Normalization: Transforming to Utterance-Specific Estimates.
Circuits Syst. Signal Process., 2016

DNNs for Unsupervised Extraction of Pseudo FMLLR Features Without Explicit Adaptation Data.
Proceedings of the Interspeech 2016, 2016

Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages.
Proceedings of the Interspeech 2016, 2016

Articulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition.
Proceedings of the Interspeech 2016, 2016

2015
Sub-band based histogram equalization in cepstral domain for speech recognition.
Speech Commun., 2015

Pronunciation Adaptation For Disordered Speech Recognition Using State-Specific Vectors of Phone-Cluster Adaptive Training.
Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, 2015

Improved acoustic modeling for automatic dysarthric speech recognition.
Proceedings of the Twenty First National Conference on Communications, 2015

Investigation of different acoustic modeling techniques for low resource Indian language data.
Proceedings of the Twenty First National Conference on Communications, 2015

Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM.
Proceedings of the INTERSPEECH 2015, 2015

2014
Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain.
Speech Commun., 2014

Improving deep neural networks using state projection vectors of subspace Gaussian mixture model as features.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Experiments on front-end techniques and segmentation model for robust Indian Language speech recognizer.
Proceedings of the Twentieth National Conference on Communications, 2014

Cross-lingual acoustic modeling for Indian languages based on Subspace Gaussian Mixture Models.
Proceedings of the Twentieth National Conference on Communications, 2014

2013
Modified cepstral mean normalization - transforming to utterance specific non-zero mean.
Proceedings of the INTERSPEECH 2013, 2013

Improved cepstral mean and variance normalization using Bayesian framework.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Acoustic modeling using transform-based phone-cluster adaptive training.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Modified splice and its extension to non-stereo data for noise robust speech recognition.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC.
IEEE Trans. Speech Audio Process., 2012

Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector.
Int. J. Speech Technol., 2012

Subspace based for Indian languages.
Proceedings of the 11th International Conference on Information Science, 2012

Computationally efficient speaker identification using fast-MLLR based anchor modeling.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Noise and speaker compensation in the Log filter bank domain.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Robust speech recognition through selection of speaker and environment transforms.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Eigen-Voice Based Anchor Modeling System for Speaker Identification Using MLLR Super-Vector.
Proceedings of the INTERSPEECH 2011, 2011

Sub-Band Level Histogram Equalization for Robust Speech Recognition.
Proceedings of the INTERSPEECH 2011, 2011

Efficient Speaker and Noise Normalization for Robust Speech Recognition.
Proceedings of the INTERSPEECH 2011, 2011

Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Computationally Efficient Speaker Identification for Large Population Tasks using MLLR and Sufficient Statistics.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Investigation of Speaker-Clustered UBMs based on Vocal Tract Lengths and MLLR matrices for Speaker Verification.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework.
Proceedings of the INTERSPEECH 2010, 2010

2009
Text-independent speaker identification using vocal tract length normalization for building universal background model.
Proceedings of the INTERSPEECH 2009, 2009

A study on the influence of covariance adaptation on jacobian compensation in vocal tract length normalization.
Proceedings of the INTERSPEECH 2009, 2009

Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors.
Proceedings of the INTERSPEECH 2009, 2009

Acoustic class specific VTLN-warping using regression class trees.
Proceedings of the INTERSPEECH 2009, 2009

Characterizing speaker variability using spectral envelopes of vowel sounds.
Proceedings of the INTERSPEECH 2009, 2009

Improving the performance of VTLN under mismatched speaker conditions and making it approach that of matched speaker conditions.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
A shift-based approach to speaker normalization using non-linear frequency-scaling model.
Speech Commun., 2008

Study of jacobian compensation using linear transformation of conventional MFCC for VTLN.
Proceedings of the INTERSPEECH 2008, 2008

Use of spectral centre of gravity for generating speaker invariant features for automatic speech recognition.
Proceedings of the INTERSPEECH 2008, 2008

A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics.
Proceedings of the INTERSPEECH 2008, 2008

2007
A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children's Speech.
IEEE Trans. Speech Audio Process., 2007

Linear transformation approach to VTLN using dynamic frequency warping.
Proceedings of the INTERSPEECH 2007, 2007

Speaker-Invariant Features for Automatic Speech Recognition.
Proceedings of the IJCAI 2007, 2007

2006
Vtln Warping Factor Estimation Using Accumulation of Sufficient Statistics.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Study Of Non-Linear Frequency Warping Functions For Speaker Normalization.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Implementing frequency-warping and VTLN through linear transformation of conventional MFCC.
Proceedings of the INTERSPEECH 2005, 2005

2004
Using VTLN for broadcast news transcription.
Proceedings of the INTERSPEECH 2004, 2004

An investigation into front-end signal processing for speaker normalization.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Non-uniform speaker normalization using affine-transformation.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
A method for compensation of Jacobian in speaker normalization.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Frequency warping and the Mel scale.
IEEE Signal Process. Lett., 2002

A simple approach to non-uniform vowel normalization.
Proceedings of the IEEE International Conference on Acoustics, 2002

Non-uniform scaling based speaker normalization.
Proceedings of the IEEE International Conference on Acoustics, 2002

2000
Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999
Scale transform in speech analysis.
IEEE Trans. Speech Audio Process., 1999

Fitting the Mel scale.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998
Improved scale-cepstral analysis in speech.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997
Frequency-warping and speaker-normalization.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

1996
Estimation of parameters of exponentially damped sinusoids using fast maximum likelihood estimation with application to NMR spectroscopy data.
IEEE Trans. Signal Process., 1996

Frequency-warping in speech.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Computationally efficient estimation of sinusoidal frequency at low SNR.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1992
Resolving the components of transient signals by a multistage procedure.
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992


  Loading...