K. Sreenivasa Rao

Orcid: 0000-0001-6112-6887

Affiliations:
  • Indian Institute of Technology Kharagpur, West Bengal, India


According to our database1, K. Sreenivasa Rao authored at least 186 papers between 2002 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation.
Signal Image Video Process., April, 2024

Hierarchical emotion recognition from speech using source, power spectral and prosodic features.
Multim. Tools Appl., 2024

Efficient Indexing of Meta-Data (Extracted from Educational Videos).
CoRR, 2024

2023
Unsupervised spoken term discovery using pseudo lexical induction.
Int. J. Speech Technol., September, 2023

A Novel Zero-Resource Spoken Term Detection Using Affinity Kernel Propagation with Acoustic Feature Map.
SN Comput. Sci., May, 2023

Accent classification from an emotional speech in clean and noisy environments.
Multim. Tools Appl., 2023

ExtSwap: Leveraging Extended Latent Mapper for Generating High Quality Face Swapping.
CoRR, 2023

Unsupervised Discovery of Recurring Spoken Terms Using Diagonal Patterns.
Proceedings of the Pattern Recognition and Machine Intelligence, 2023

Relation Predictions in Comorbid Disease Centric Knowledge Graph Using Heterogeneous GNN Models.
Proceedings of the Bioinformatics and Biomedical Engineering, 2023

Similarity-based Multi-Modal Lecture Video Indexing and Retrieval with Deep Learning.
Proceedings of the 14th International Conference on Computing Communication and Networking Technologies, 2023

2022
VOP detection for read and conversation speech using CWT coefficients and phone boundaries.
J. Ambient Intell. Humaniz. Comput., 2022

Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features.
Int. J. Speech Technol., 2022

Correction to: CycleGAN-Based Speech Mode Transformation Model for Robust Multilingual ASR.
Circuits Syst. Signal Process., 2022

CycleGAN-Based Speech Mode Transformation Model for Robust Multilingual ASR.
Circuits Syst. Signal Process., 2022

Phoneme Segmentation-Based Unsupervised Pattern Discovery and Clustering of Speech Signals.
Circuits Syst. Signal Process., 2022

A novel approach to unsupervised pattern discovery in speech using Convolutional Neural Network.
Comput. Speech Lang., 2022

Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review.
CoRR, 2022

NrityaManch: An Annotation and Retrieval System for Bharatanatyam Dance.
Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation, 2022

2021
Relation Prediction of Co-Morbid Diseases Using Knowledge Graph Completion.
IEEE ACM Trans. Comput. Biol. Bioinform., 2021

Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2021

Robust vowel region detection method for multimode speech.
Multim. Tools Appl., 2021

Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework.
Int. J. Speech Technol., 2021

Moving ridge neuronal espionage network simulation for reticulum invasion sensing.
Int. J. Pervasive Comput. Commun., 2021

SongF0: A Spectrum-Based Fundamental Frequency Estimation for Monophonic Songs.
Circuits Syst. Signal Process., 2021

hf<sub>0</sub>: A Hybrid Pitch Extraction Method for Multimodal Voice.
Circuits Syst. Signal Process., 2021

Multilingual Audio-Visual Smartphone Dataset and Evaluation.
IEEE Access, 2021

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey.
IEEE Access, 2021

Knowledge Distillation for Singing Voice Detection.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

2020
BOXREC: Recommending a Box of Preferred Outfits in Online Shopping.
ACM Trans. Intell. Syst. Technol., 2020

Children's Story Classification in Indian Languages Using Linguistic and Keyword-based Features.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2020

Multilingual and multimode phone recognition system for Indian languages.
Speech Commun., 2020

Robust <i>f</i><sub>0</sub> extraction from monophonic signals using adaptive sub-band filtering.
Speech Commun., 2020

DNN-Based Cross-Lingual Voice Conversion Using Bottleneck Features.
Neural Process. Lett., 2020

Excitation modelling using epoch features for statistical parametric speech synthesis.
Comput. Speech Lang., 2020

Detection of Specific Language Impairment in Children Using Glottal Source Features.
IEEE Access, 2020

Multilingual speech mode classification model for Indian languages.
Proceedings of the 2020 National Conference on Communications, 2020

Glottal Closure Instants Detection from EGG Signal by Classification Approach.
Proceedings of the Interspeech 2020, 2020

2019
CWT-Based Approach for Epoch Extraction From Telephone Quality Speech.
IEEE Signal Process. Lett., 2019

Development and analysis of multilingual phone recognition systems using Indian languages.
Int. J. Speech Technol., 2019

Incorporation of Manner of Articulation Constraint in LSTM for Speech Recognition.
Circuits Syst. Signal Process., 2019

hf0: A hybrid pitch extraction method for multimodal voice.
CoRR, 2019

LSTM-Based Robust Voicing Decision Applied to DNN-Based Speech Synthesis.
Autom. Control. Comput. Sci., 2019

Glottal Closure Instants Detection from Speech Signal by Deep Features Extracted from Raw Speech and Linear Prediction Residual.
Proceedings of the Interspeech 2019, 2019

Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis - Studies in Speech Signal Processing, Natural Language Understanding, and Machine Learning
Springer, ISBN: 978-3-030-02758-2, 2019

2018
Epoch detection from emotional speech signal using zero time windowing.
Speech Commun., 2018

A robust unsupervised pattern discovery and clustering of speech signals.
Pattern Recognit. Lett., 2018

Neural network and GMM based feature mappings for consonant-vowel recognition in emotional environment.
Int. J. Speech Technol., 2018

Improvement of phone recognition accuracy using speech mode classification.
Int. J. Speech Technol., 2018

Language identification using phase information.
Int. J. Speech Technol., 2018

Automatic note transcription system for Hindustani classical music.
Int. J. Speech Technol., 2018

Inverse filter based excitation model for HMM-based speech synthesis system.
IET Signal Process., 2018

Improvement of Phone Recognition Accuracy Using Articulatory Features.
Circuits Syst. Signal Process., 2018

Predominant Melody Extraction from Vocal Polyphonic Music Signal by Time-Domain Adaptive Filtering-Based Method.
Circuits Syst. Signal Process., 2018

Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning.
CoRR, 2018

Beam Search Decoding using Manner of Articulation Detection Knowledge Derived from Connectionist Temporal Classification.
CoRR, 2018

Manner of Articulation Detection using Connectionist Temporal Classification to Improve Automatic Speech Recognition Performance.
CoRR, 2018

One for the Road: Recommending Male Street Attire.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2018

Analysis of sparse representation based feature on speech mode classification.
Proceedings of the Interspeech 2018, 2018

Indian Languages ASR: A Multilingual Phone Recognition Framework with IPA Based Common Phone-set, Predicted Articulatory Features and Feature fusion.
Proceedings of the Interspeech 2018, 2018

Classification of Disorders in Vocal Folds Using Electroglottographic Signal.
Proceedings of the Interspeech 2018, 2018

Harmonic-Percussive Source Separation of Polyphonic Music by Suppressing Impulsive Noise Events.
Proceedings of the Interspeech 2018, 2018

Modifying LSTM Posteriors with Manner of Articulation Knowledge to Improve Speech Recognition Performance.
Proceedings of the 17th IEEE International Conference on Machine Learning and Applications, 2018

Robust Detection of Glottal Activity Using Unwrapped Phase Electroglottographic Signal.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Discriminative sparse representation for speech mode classification.
Proceedings of the 2018 International Conference on Advances in Computing, 2018

DNN-based Bilingual (Telugu-Hindi) Polyglot Speech Synthesis.
Proceedings of the 2018 International Conference on Advances in Computing, 2018

Audio Mining: Unsupervised Spoken Term Detection over an Audio Database.
Proceedings of the 2018 International Conference on Advances in Computing, 2018

2017
Robust Pitch Extraction Method for the HMM-Based Speech Synthesis System.
IEEE Signal Process. Lett., 2017

Supervector-based approaches in a discriminative framework for speaker verification in noisy environments.
Int. J. Speech Technol., 2017

Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech.
Int. J. Speech Technol., 2017

Parameterization of Excitation Signal for Improving the Quality of HMM-Based Speech Synthesis System.
Circuits Syst. Signal Process., 2017

Generation of creaky voice for improving the quality of HMM-based speech synthesis.
Comput. Speech Lang., 2017

Parametric representation of excitation source information for language identification.
Comput. Speech Lang., 2017

Implicit processing of LP residual for language identification.
Comput. Speech Lang., 2017

Robust glottal activity detection using the phase of an electroglottographic signal.
Biomed. Signal Process. Control., 2017

Accurate Synchronization of Speech and EGG Signal Using Phase Information.
Proceedings of the Interspeech 2017, 2017

2016
Time-domain deterministic plus noise model based hybrid source modeling for statistical parametric speech synthesis.
Speech Commun., 2016

Voice/non-voice detection using phase of zero frequency filtered speech signal.
Speech Commun., 2016

Articulatory and excitation source features for speech recognition in read, extempore and conversation modes.
Int. J. Speech Technol., 2016

Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks.
Neurocomputing, 2016

Prosodic Mapping Using Neural Networks for Emotion Conversion in Hindi Language.
Circuits Syst. Signal Process., 2016

A Robust Non-Parametric and Filtering Based Approach for Glottal Closure Instant Detection.
Proceedings of the Interspeech 2016, 2016

Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals.
Proceedings of the Interspeech 2016, 2016

Sentence Based Discourse Classification for Hindi Story Text-to-Speech (TTS) System.
Proceedings of the 13th International Conference on Natural Language Processing, 2016

Predominant melody extraction from vocal polyphonic music signal by combined spectro-temporal method.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A deterministic plus noise model of excitation signal using principal component analysis for parametric speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Language identification using PLDA based on i-vector in noisy environment.
Proceedings of the 2016 International Conference on Advances in Computing, 2016

Designing automatic note transcription system for Hindustani classical music.
Proceedings of the 2016 International Conference on Advances in Computing, 2016

Deep neural networks for kannada phoneme recognition.
Proceedings of the Ninth International Conference on Contemporary Computing, 2016

2015
Recognition of emotions from video using acoustic and facial features.
Signal Image Video Process., 2015

Implicit excitation source features for robust language identification.
Int. J. Speech Technol., 2015

Source and system features for phone recognition.
Int. J. Speech Technol., 2015

Robust Voicing Detection and \(F_{0}\) Estimation for HMM-Based Speech Synthesis.
Circuits Syst. Signal Process., 2015

Data-driven pause prediction for speech synthesis in storytelling style speech.
Proceedings of the Twenty First National Conference on Communications, 2015

Hybrid Source Modeling Method Utilizing Optimal Residual Frames for HMM-based Speech Synthesis.
Proceedings of the Mining Intelligence and Knowledge Exploration, 2015

Automatic detection of creaky voice using epoch parameters.
Proceedings of the INTERSPEECH 2015, 2015

Conversion of neutral speech to storytelling style speech.
Proceedings of the Eighth International Conference on Advances in Pattern Recognition, 2015

Optimal residual frame based source modeling for HMM-based speech synthesis.
Proceedings of the Eighth International Conference on Advances in Pattern Recognition, 2015

Contribution of Telugu vowels in identifying emotions.
Proceedings of the Eighth International Conference on Advances in Pattern Recognition, 2015

Raga identification based on Normalized Note Histogram features.
Proceedings of the 2015 International Conference on Advances in Computing, 2015

Children story classification based on structure of the story.
Proceedings of the 2015 International Conference on Advances in Computing, 2015

Analysis and modeling pauses for synthesis of storytelling speech based on discourse modes.
Proceedings of the Eighth International Conference on Contemporary Computing, 2015

Neutral to happy emotion conversion by blending prosody and laughter.
Proceedings of the Eighth International Conference on Contemporary Computing, 2015

Multi-stage children story speech synthesis for Hindi.
Proceedings of the Eighth International Conference on Contemporary Computing, 2015

Analysis and modification of spectral energy for neutral to sad emotion conversion.
Proceedings of the Eighth International Conference on Contemporary Computing, 2015

Robust language identification using Power Normalized Cepstral Coefficients.
Proceedings of the Eighth International Conference on Contemporary Computing, 2015

Improved recognition rate of language identification system in noisy environment.
Proceedings of the Eighth International Conference on Contemporary Computing, 2015

2014
Speech Processing in Mobile Environments
Springer Briefs in Electrical and Computer Engineering, Springer, ISBN: 978-3-319-03116-3, 2014

Segmentation, indexing and retrieval of TV broadcast news bulletins using Gaussian mixture models and vector quantization codebooks.
Int. J. Speech Technol., 2014

Film segmentation and indexing using autoassociative neural networks.
Int. J. Speech Technol., 2014

Stochastic feature compensation methods for speaker verification in noisy environments.
Appl. Soft Comput., 2014

Automatic Phonetic Transcription for read, extempore and conversation speech for an Indian language: Bengali.
Proceedings of the Twentieth National Conference on Communications, 2014

A novel boosting algorithm for improved i-vector based speaker verification in noisy environments.
Proceedings of the INTERSPEECH 2014, 2014

Duration Modeling by Multi-Models based on Vowel Production characteristics.
Proceedings of the 11th International Conference on Natural Language Processing, 2014

Designing prosody rule-set for converting neutral TTS speech to storytelling style speech for Indian languages: Bengali, Hindi and Telugu.
Proceedings of the Seventh International Conference on Contemporary Computing, 2014

Significance of CV transition and steady vowel regions for language identification.
Proceedings of the Seventh International Conference on Contemporary Computing, 2014

2013
Emotion Recognition using Speech Features
Springer Briefs in Electrical and Computer Engineering, Springer, ISBN: 978-1-4614-5143-3, 2013

Robust Emotion Recognition using Spectral and Prosodic Features
Springer Briefs in Electrical and Computer Engineering, Springer, ISBN: 978-1-4614-6360-3, 2013

Detection of Vowel Offset Point From Speech Signal.
IEEE Signal Process. Lett., 2013

Non-uniform time scale modification using instants of significant excitation and vowel onset points.
Speech Commun., 2013

Classification of Infant Cries Using Dynamics of Epoch Features.
J. Intell. Syst., 2013

Vowel onset point detection for noisy speech using spectral energy at formant frequencies.
Int. J. Speech Technol., 2013

Identification of Indian languages using multi-level spectral and prosodic features.
Int. J. Speech Technol., 2013

Pitch synchronous and glottal closure based speech analysis for language recognition.
Int. J. Speech Technol., 2013

Emotion recognition from speech using global and local prosodic features.
Int. J. Speech Technol., 2013

Characterization and recognition of emotions from speech using excitation source information.
Int. J. Speech Technol., 2013

Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis.
Comput. Speech Lang., 2013

Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis.
Appl. Soft Comput., 2013

Duration Modeling Using Multi-model Based on Positional Information.
Proceedings of the Pattern Recognition and Machine Intelligence, 2013

Corpus Based Emotional Speech Synthesis in Hindi.
Proceedings of the Pattern Recognition and Machine Intelligence, 2013

Significance of utterance partitioning in GMM-SVM based speaker verification in varying background environment.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

A syllable-based framework for unit selection synthesis in 13 Indian languages.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Language identification using Hilbert envelope and phase information of linear prediction residual.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Development of phonetic engine for Indian languages: Bengali and Oriya.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Phonetic and Prosodically Rich Transcribed speech corpus in Indian languages: Bengali and Odia.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Importance of Utterance Partitioning in SVM Classifier with GMM Supervectors for Text-Independent Speaker Verification.
Proceedings of the Mining Intelligence and Knowledge Exploration, 2013

High quality text-to-speech synthesis system with efficient duration models developed using coding schemes based on vowel production characteristics.
Proceedings of the 13th International Conference on Intellient Systems Design and Applications, 2013

Analysis of detection of vowel offset point for coded speech.
Proceedings of the Sixth International Conference on Contemporary Computing, 2013

2012
Predicting Prosody from Text for Text-to-Speech Synthesis
Springer Briefs in Electrical and Computer Engineering, Springer, ISBN: 978-1-4614-1338-7, 2012

Syllable Specific Unit Selection Cost Functions for Text-to-Speech Synthesis.
ACM Trans. Speech Lang. Process., 2012

Vowel Onset Point Detection for Low Bit Rate Coded Speech.
IEEE Trans. Speech Audio Process., 2012

Neural network based feature transformation for emotion independent speaker identification.
Int. J. Speech Technol., 2012

A pitch synchronous approach to design voice conversion system using source-filter correlation.
Int. J. Speech Technol., 2012

Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features.
Int. J. Speech Technol., 2012

Emotion recognition from speech using source, system, and prosodic features.
Int. J. Speech Technol., 2012

Emotion recognition from speech: a review.
Int. J. Speech Technol., 2012

Spotting and Recognition of Consonant-Vowel Units from Continuous Speech Using Accurate Detection of Vowel Onset Points.
Circuits Syst. Signal Process., 2012

Unconstrained Pitch Contour Modification Using Instants of Significant Excitation.
Circuits Syst. Signal Process., 2012

Comparing ANN and GMM in a voice conversion framework.
Appl. Soft Comput., 2012

Better human computer interaction by enhancing the quality of text-to-speech synthesis.
Proceedings of the 4th International Conference on Intelligent Human Computer Interaction, 2012

Intensity Modeling for Syllable Based Text-to-Speech Synthesis.
Proceedings of the Contemporary Computing - 5th International Conference, 2012

Spoken Language Identification Using Spectral Features.
Proceedings of the Contemporary Computing - 5th International Conference, 2012

Emotion Recognition from Semi Natural Speech Using Artificial Neural Networks and Excitation Source Features.
Proceedings of the Contemporary Computing - 5th International Conference, 2012

Real Life Emotion Classification from Speech Using Gaussian Mixture Models.
Proceedings of the Contemporary Computing - 5th International Conference, 2012

Data-Driven Phrase Break Prediction for Bengali Text-to-Speech System.
Proceedings of the Contemporary Computing - 5th International Conference, 2012

Speaker recognition in the case of emotional environment using transformation of speech features.
Proceedings of the CUBE International IT Conference & Exhibition, 2012

Voice conversion using linear prediction coefficients and artificial neural network.
Proceedings of the CUBE International IT Conference & Exhibition, 2012

2011
Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing.
Int. J. Speech Technol., 2011

Application of prosody models for developing speech systems in Indian languages.
Int. J. Speech Technol., 2011

Development of syllable-based text to speech synthesis system in Bengali.
Int. J. Speech Technol., 2011

Two stage emotion recognition based on speaking rate.
Int. J. Speech Technol., 2011

Recognition of emotions from video using neural network models.
Expert Syst. Appl., 2011

Effect of Noise on Vowel Onset Point Detection.
Proceedings of the Contemporary Computing - 4th International Conference, 2011

Effect of Noise on Recognition of Consonant-Vowel (CV) Units.
Proceedings of the Contemporary Computing - 4th International Conference, 2011

Segment Specific Concatenation Cost for Syllable Based Bengali TTS.
Proceedings of the Contemporary Computing - 4th International Conference, 2011

Text Independent Emotion Recognition Using Spectral Features.
Proceedings of the Contemporary Computing - 4th International Conference, 2011

2010
Real Time Prosody Modification.
J. Signal Inf. Process., 2010

Selection of Suitable Features for Modeling the Durations of Syllables.
J. Softw. Eng. Appl., 2010

Voice conversion by mapping the speaker-specific features using pitch synchronous approach.
Comput. Speech Lang., 2010

Effect of Speech Coding on Recognition of Consonant-Vowel (CV) Units.
Proceedings of the Contemporary Computing - Third International Conference, 2010

Emotion Classification Based on Speaking Rate.
Proceedings of the Contemporary Computing - Third International Conference, 2010

2009
Duration modification using glottal closure instants and vowel onset points.
Speech Commun., 2009

Intonation modeling for Indian languages.
Comput. Speech Lang., 2009

Unit Selection Using Linguistic, Prosodic and Spectral Distance for Developing Text-to-Speech System in Hindi.
Proceedings of the Pattern Recognition and Machine Intelligence, 2009

Exploring Speech Features for Classifying Emotions along Valence Dimension.
Proceedings of the Pattern Recognition and Machine Intelligence, 2009

Significance of Word and Syllable Level Information for Expressive Speech Processing.
Proceedings of the Seventh International Conference on Advances in Pattern Recognition, 2009

IITKGP-SESC: Speech Database for Emotion Analysis.
Proceedings of the Contemporary Computing - Second International Conference, 2009

2008
Modeling Supra-Segmental Features of Syllables Using Neural Networks.
Proceedings of the Speech, 2008

2007
Determination of Instants of Significant Excitation in Speech Using Hilbert Envelope and Group Delay Function.
IEEE Signal Process. Lett., 2007

Modeling durations of syllables using neural networks.
Comput. Speech Lang., 2007

Voice Transformation by Mapping the Features at Syllable Level.
Proceedings of the Pattern Recognition and Machine Intelligence, 2007

2006
Prosody modification using instants of significant excitation.
IEEE Trans. Speech Audio Process., 2006

Voice Conversion by Prosody and Vocal Tract Modification.
Proceedings of the 9th International Conference in Information Technology, 2006

2004
Two-Stage Duration Model for Indian Languages Using Neural Networks.
Proceedings of the Neural Information Processing, 11th International Conference, 2004

Modeling syllable duration in Indian languages using neural networks.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
Prosodic manipulation using instants of significant excitation.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Speech enhancement using excitation source information.
Proceedings of the IEEE International Conference on Acoustics, 2002


  Loading...