Srinivasan Umesh

CoRR, November, 2025

Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching.

[BibT_eX]

[DOI]

CoRR, June, 2025

Effectively combining Phi-4 and NLLB for Spoken Language Translation: SPRING Lab IITM's submission to Low Resource Multilingual Indic Track.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Spoken Language Translation, 2025

EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

MADASR 2.0: Multi-Lingual Multi-Dialect ASR Challenge in 8 Indian Languages.

[BibT_eX]

[DOI]

Srikanth S. Narayanan

Howard Lakougna

Prasanta Kumar Ghosh

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages.

[BibT_eX]

[DOI]

Advait Joglekar

CoRR, 2024

On the relationship between speech and hearing.

[BibT_eX]

[DOI]

CoRR, 2024

SPRING Lab IITM's Submission to Low Resource Indic Language Translation Shared Task.

[BibT_eX]

[DOI]

Advait Joglekar

Hamees Ul Hasan Sayed

Narla John Metilda Sagaya Mary

Proceedings of the Ninth Conference on Machine Translation, 2024

Lite ASR Transformer: A Light Weight Transformer Architecture For Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

All Ears: Building Self-Supervised Learning based ASR models for Indian Languages at scale.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

FusDom: Combining in-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Stable Distillation: Regularizing Continued Pre-Training for Low-Resource Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR.

[BibT_eX]

[DOI]

CoRR, 2023

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Ramanan Sivaguru

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Channel-Aware Pretraining Of Joint Encoder-Decoder Self-Supervised Model For Telephonic-Speech ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

SLICER: Learning Universal Audio Representations Using Low-Resource Self-Supervised Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Unfused: Unsupervised Finetuning Using Self Supervised Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Data2vec-Aqc: Search for the Right Teaching Assistant in the Teacher-Student Training Setup.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

MAST: Multiscale Audio Spectrogram Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Towards Developing State-of-The-Art TTS Synthesisers for 13 Indian Languages with Signal Processing Aided Alignments.

[BibT_eX]

[DOI]

Anusha Prakash

Narla John Metilda Sagaya Mary

Hema A. Murthy

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder.

[BibT_eX]

[DOI]

Sandesh Varadaraju Katta

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Decorrelating Feature Spaces for Learning General-Purpose Audio Representations.

[BibT_eX]

[DOI]

Ashish Seth

IEEE J. Sel. Top. Signal Process., 2022

Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR.

[BibT_eX]

[DOI]

A. Arunkumar

CoRR, 2022

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition.

[BibT_eX]

[DOI]

Lodagala Durga Prasad

Ashish Seth

Lodagala V. S. V. Durga Prasad

CoRR, 2022

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations.

[BibT_eX]

[DOI]

CoRR, 2022

A Discourse Aware Sequence Learning Approach for Emotion Recognition in Conversations.

[BibT_eX]

[DOI]

Harshvardhan Srivastava

CoRR, 2022

MMER: Multimodal Multi-task learning for Emotion Recognition in Spoken Utterances.

[BibT_eX]

[DOI]

Harshvardhan Srivastava

CoRR, 2022

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning.

[BibT_eX]

[DOI]

Ashish Seth

CoRR, 2022

Domain Adaptation of Low-Resource Target-Domain Models Using Well-Trained ASR Conformer Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CCC-WAV2VEC 2.0: Clustering AIDED Cross Contrastive Self-Supervised Learning of Speech Representations.

[BibT_eX]

[DOI]

Lodagala V. S. V. Durga Prasad

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi.

[BibT_eX]

[DOI]

Adithya Raj Kolladath

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Joint Encoder-Decoder Self-Supervised Pre-training for ASR.

[BibT_eX]

[DOI]

A. Arunkumar

Vrunda Nileshkumar Sukhadia

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition.

[BibT_eX]

[DOI]

A. Arunkumar

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Investigation of Robustness of Hubert Features from Different Layers to Domain, Accent and Language Variations.

[BibT_eX]

[DOI]

Pratik Kumar

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Deep Clustering For General-Purpose Audio Representations.

[BibT_eX]

[DOI]

CoRR, 2021

Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

S-vectors: Speaker Embeddings based on Transformer's Encoder for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Sandesh V. Katta

CoRR, 2020

Investigation of Speaker-adaptation methods in Transformer based ASR.

[BibT_eX]

[DOI]

CoRR, 2020

Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Investigation of Methods to Improve the Recognition Performance of Tamil-English Code-Switched Data in Transformer Framework.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Building Multilingual End-to-End Speech Synthesisers for Indian Languages.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

2018

FMLLR Speaker Normalization With i-Vector: In Pseudo-FMLLR and Distillation Framework.

[BibT_eX]

[DOI]

Sandeep Reddy Kothinti

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigating the Effect of Audio Duration on Dementia Detection Using Acoustic Features.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Correlational Networks for Speaker Normalization in Automatic Speech Recognition.

[BibT_eX]

[DOI]

Rini A. Sharon

Sandeep Reddy Kothinti

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

DNNs for unsupervised extraction of pseudo speaker-normalized features without explicit adaptation data.

[BibT_eX]

[DOI]

Murali Karthick Baskar

Speech Commun., 2017

An automated technique to generate phone-to-articulatory label mapping.

[BibT_eX]

[DOI]

Speech Commun., 2017

Addressing data sparsity in DNN acoustic modeling.

[BibT_eX]

[DOI]

Seeram Tejaswi

Proceedings of the Twenty-third National Conference on Communications, 2017

DNN acoustic models for dysarthric speech.

[BibT_eX]

[DOI]

Seeram Tejaswi

Proceedings of the Twenty-third National Conference on Communications, 2017

On Improving Acoustic Models for TORGO Dysarthric Speech Database.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Generalized Distillation Framework for Speaker Normalization.

[BibT_eX]

[DOI]

Sandeep Reddy Kothinti

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Joint Estimation of Articulatory Features and Acoustic Models for Low-Resource Languages.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages.

[BibT_eX]

[DOI]

Tejaswi Seeram

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Modified Mean and Variance Normalization: Transforming to Utterance-Specific Estimates.

[BibT_eX]

[DOI]

Vikas Joshi

N. Vishnu Prasad

Circuits Syst. Signal Process., 2016

Improved phone-cluster adaptive training acoustic model.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Signal Processing and Communications (SPCOM), 2016

DNNs for Unsupervised Extraction of Pseudo FMLLR Features Without Explicit Adaptation Data.

[BibT_eX]

[DOI]

Murali Karthick Baskar

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Articulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Sub-band based histogram equalization in cepstral domain for speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2015

Pronunciation Adaptation For Disordered Speech Recognition Using State-Specific Vectors of Phone-Cluster Adaptive Training.

[BibT_eX]

[DOI]

M. Ramasubba Reddy

Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, 2015

Improved acoustic modeling for automatic dysarthric speech recognition.

[BibT_eX]

[DOI]

M. Ramasubba Reddy

Proceedings of the Twenty First National Conference on Communications, 2015

Investigation of different acoustic modeling techniques for low resource Indian language data.

[BibT_eX]

[DOI]

Murali Karthick B

Proceedings of the Twenty First National Conference on Communications, 2015

Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM.

[BibT_eX]

[DOI]

Murali Karthick B

Prateek Kolhar

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain.

[BibT_eX]

[DOI]

Aanchan Mohan

Richard C. Rose

Sina Hamidi Ghalehjegh

Speech Commun., 2014

Improving deep neural networks using state projection vectors of subspace Gaussian mixture model as features.

[BibT_eX]

[DOI]

Murali Karthick B

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Experiments on front-end techniques and segmentation model for robust Indian Language speech recognizer.

[BibT_eX]

[DOI]

Murali Karthick Baskar

Proceedings of the Twentieth National Conference on Communications, 2014

Cross-lingual acoustic modeling for Indian languages based on Subspace Gaussian Mixture Models.

[BibT_eX]

[DOI]

Proceedings of the Twentieth National Conference on Communications, 2014

2013

Modified cepstral mean normalization - transforming to utterance specific non-zero mean.

[BibT_eX]

[DOI]

Vikas Joshi

N. Vishnu Prasad

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improved cepstral mean and variance normalization using Bayesian framework.

[BibT_eX]

[DOI]

N. Vishnu Prasad

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Acoustic modeling using transform-based phone-cluster adaptive training.

[BibT_eX]

[DOI]

Vimal Manohar

Srinivas C. Bhargav

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Modified splice and its extension to non-stereo data for noise robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector.

[BibT_eX]

[DOI]

Int. J. Speech Technol., 2012

Subspace based for Indian languages.

[BibT_eX]

[DOI]

Aanchan Mohan

Richard C. Rose

Proceedings of the 11th International Conference on Information Science, 2012

Computationally efficient speaker identification using fast-MLLR based anchor modeling.

[BibT_eX]

[DOI]

M. Carmen Benítez Ortúzar

Jean-François Bonastre

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Noise and speaker compensation in the Log filter bank domain.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Robust speech recognition through selection of speaker and environment transforms.

[BibT_eX]

[DOI]

M. Carmen Benítez Ortúzar

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Eigen-Voice Based Anchor Modeling System for Speaker Identification Using MLLR Super-Vector.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Sub-Band Level Histogram Equalization for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Efficient Speaker and Noise Normalization for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Computationally Efficient Speaker Identification for Large Population Tasks using MLLR and Sufficient Statistics.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Investigation of Speaker-Clustered UBMs based on Vocal Tract Lengths and MLLR matrices for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009

Text-independent speaker identification using vocal tract length normalization for building universal background model.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A study on the influence of covariance adaptation on jacobian compensation in vocal tract length normalization.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Acoustic class specific VTLN-warping using regression class trees.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Characterizing speaker variability using spectral envelopes of vowel sounds.

[BibT_eX]

[DOI]

A. N. Harish

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Improving the performance of VTLN under mismatched speaker conditions and making it approach that of matched speaker conditions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

A shift-based approach to speaker normalization using non-linear frequency-scaling model.

[BibT_eX]

[DOI]

Speech Commun., 2008

Study of jacobian compensation using linear transformation of conventional MFCC for VTLN.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Use of spectral centre of gravity for generating speaker invariant features for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007

A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children's Speech.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2007

Linear transformation approach to VTLN using dynamic frequency warping.

[BibT_eX]

[DOI]

D. Dinesh Kumar

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Speaker-Invariant Features for Automatic Speech Recognition.

[BibT_eX]

[DOI]

G. Praveen

Proceedings of the IJCAI 2007, 2007

2006

Vtln Warping Factor Estimation Using Accumulation of Sufficient Statistics.

[BibT_eX]

[DOI]

Jonas Lööf

Hermann Ney

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Study Of Non-Linear Frequency Warping Functions For Speaker Normalization.

[BibT_eX]

[DOI]

S. V. Bharath Kumar

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Implementing frequency-warping and VTLN through linear transformation of conventional MFCC.

[BibT_eX]

[DOI]

András Zolnay

Hermann Ney

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004

Using VTLN for broadcast news transcription.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

An investigation into front-end signal processing for speaker normalization.

[BibT_eX]

[DOI]

S. V. Bharath Kumar

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Non-uniform speaker normalization using affine-transformation.

[BibT_eX]

[DOI]

S. V. Bharath Kumar

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

A method for compensation of Jacobian in speaker normalization.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Frequency warping and the Mel scale.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2002

A simple approach to non-uniform vowel normalization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2002

Non-uniform scaling based speaker normalization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2002

2000

Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition.

[BibT_eX]

[DOI]

Sarangarajan Parthasarathy

Richard C. Rose

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999

Scale transform in speech analysis.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 1999

Fitting the Mel scale.

[BibT_eX]

[DOI]

Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998

Improved scale-cepstral analysis in speech.

[BibT_eX]

[DOI]

Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997

Frequency-warping and speaker-normalization.

[BibT_eX]

[DOI]

Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

1996

Estimation of parameters of exponentially damped sinusoids using fast maximum likelihood estimation with application to NMR spectroscopy data.

[BibT_eX]

[DOI]

Donald W. Tufts

IEEE Trans. Signal Process., 1996

Frequency-warping in speech.

[BibT_eX]

[DOI]

Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Computationally efficient estimation of sinusoidal frequency at low SNR.

[BibT_eX]

[DOI]

Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1992

Resolving the components of transient signals by a multistage procedure.

[BibT_eX]

[DOI]