We stand with Ukraine

We stand with Ukraine

Rita Singh

Orcid: 0000-0003-3743-0162

According to our database¹, Rita Singh authored at least 170 papers between 1998 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Lightweight Complementary-Cue Fusion for Robust Video Face Forgery Detection.

[DOI]

,

,

Karanveer Singh

,

CoRR, May, 2026

What and When to Learn: CURriculum Ranking Loss for Large-Scale Speaker Verification.

[DOI]

,

,

,

CoRR, March, 2026

VerLM: Explaining Face Verification Using Natural Language.

[DOI]

Syed Abdul Hannan

,

Hazim T. Bukhari

,

Thomas Cantalapiedra

,

,

,

,

CoRR, January, 2026

2025

DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Supervised Speech Foundational Model.

[DOI]

,

,

CoRR, October, 2025

No Encore: Unlearning as Opt-Out in Music Generation.

[DOI]

,

,

,

,

CoRR, September, 2025

OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics.

[DOI]

,

,

,

,

Xavier Menéndez-Pidal

,

,

,

,

,

CoRR, September, 2025

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings.

[DOI]

,

,

,

CoRR, June, 2025

Human Voice is Unique.

[DOI]

,

CoRR, June, 2025

CAARMA: Class Augmentation with Adversarial Mixup Regularization.

[DOI]

,

,

,

,

CoRR, March, 2025

A New Benchmark for Few-Shot Class-Incremental Learning: Redefining the Upper Bound.

[DOI]

,

,

,

CoRR, March, 2025

Mellow: a small audio language model for reasoning.

[DOI]

,

,

,

CoRR, March, 2025

Krait: A Backdoor Attack Against Graph Prompt Tuning.

[DOI]

,

,

Balaji Palanisamy

Proceedings of the IEEE Conference on Secure and Trustworthy Machine Learning, 2025

ADIFF: Explaining audio difference using natural language.

[DOI]

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Does Prior Data Matter? Exploring Joint Training in the Context of Few-Shot Class-Incremental Learning.

[DOI]

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Tessellated Linear Model for Age Prediction from Voice.

[DOI]

Dareen Alharthi

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs.

[DOI]

,

Myeongseok Gwon

,

,

,

,

,

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

CAARMA: Class Augmentation with Adversarial Mixup Regularization.

[DOI]

,

,

,

Syed Abdul Hannan

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions.

[DOI]

,

,

Francisco Teixeira

,

Kateryna Shapovalenko

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

PlaceSim: An LLM-based Interactive Platform for Human Behavior Simulation in Physical Facilities.

[DOI]

,

,

,

Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025

CoLMbo: Speaker Language Model for Descriptive Profiling.

[DOI]

,

,

Syed Abdul Hannan

,

Purusottam Samal

,

Karanveer Singh

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

On the Robust Approximation of ASR Metrics.

[DOI]

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models.

[DOI]

,

,

,

Monojit Choudhury

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding.

[DOI]

,

,

Hazim T. Bukhari

,

Benjamin Elizalde

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

A closer look at reinforcement learning-based automatic speech recognition.

[DOI]

,

,

,

,

,

,

Comput. Speech Lang., 2024

What Do Speech Foundation Models Not Learn About Speech?

[DOI]

,

,

,

CoRR, 2024

Objective Measurements of Voice Quality.

[DOI]

,

CoRR, 2024

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features.

[DOI]

,

,

,

CoRR, 2024

Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection.

[DOI]

Ksheeraja Raghavan

,

,

,

Surabhi Raghavan

,

Wolfram Burgard

,

,

CoRR, 2024

Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?

[DOI]

Roshan S. Sharma

,

,

,

,

,

CoRR, 2024

ControlVAR: Exploring Controllable Visual Autoregressive Modeling.

[DOI]

,

,

,

,

,

,

CoRR, 2024

PDAF: A Phonetic Debiasing Attention Framework For Speaker Verification.

[DOI]

,

Abdulhamid Aldoobi

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations.

[DOI]

,

,

,

,

,

,

,

Masashi Sugiyama

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

R-BASS : Relevance-aided Block-wise Adaptation for Speech Summarization.

[DOI]

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Domain Adaptation for Contrastive Audio-Language Models.

[DOI]

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

PAM: Prompting Audio-Language Models for Audio Quality Assessment.

[DOI]

,

Dareen Alharthi

,

Benjamin Elizalde

,

,

Mahmoud Al Ismail

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios.

[DOI]

Hazim T. Bukhari

,

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Completing Visual Objects via Bridging Generation and Segmentation.

[DOI]

,

,

Chung-Ching Lin

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

A General Framework for Learning from Weak Supervision.

[DOI]

,

,

,

,

,

,

Masashi Sugiyama

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Vocal Fold Dynamics for Automatic Detection of Amyotrophic Lateral Sclerosis from Voice.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Prompting Audios Using Acoustic Properties for Emotion Representation.

[DOI]

,

Benjamin Elizalde

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Training Audio Captioning Models without Audio.

[DOI]

,

Benjamin Elizalde

,

Dimitra Emmanouilidou

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Importance of Negative Sampling in Weak Label Learning.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

R<sup>2</sup>-Bench: Benchmarking the Robustness of Referring Perception Models Under Perturbations.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation.

[DOI]

,

Entropy, July, 2023

A Gene-Based Algorithm for Identifying Factors That May Affect a Speaker's Voice.

[DOI]

Entropy, June, 2023

SphereFace Revived: Unifying Hyperspherical Face Recognition.

[DOI]

,

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., 2023

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model.

[DOI]

Muhammad Ahmed Shah

,

,

,

Raphaël Olivier

,

,

,

Dareen Alharthi

,

Hazim T. Bukhari

,

,

,

Michael Kuhlmann

,

,

CoRR, 2023

Completing Visual Objects via Bridging Generation and Segmentation.

[DOI]

,

,

Chung-Ching Lin

,

,

,

CoRR, 2023

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech.

[DOI]

Dareen Alharthi

,

,

,

,

,

CoRR, 2023

Rethinking Audiovisual Segmentation with Semantic Quantization and Decomposition.

[DOI]

,

,

,

,

,

,

CoRR, 2023

Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations.

[DOI]

,

,

,

,

,

,

Masashi Sugiyama

,

,

CoRR, 2023

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content.

[DOI]

,

,

,

,

,

CoRR, 2023

PaintSeg: Painting Pixels for Training-free Segmentation.

[DOI]

,

Chung-Ching Lin

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Pengi: An Audio Language Model for Audio Tasks.

[DOI]

,

Benjamin Elizalde

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Rethinking Voice-Face Correlation: A Geometry View.

[DOI]

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features.

[DOI]

,

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BASS: Block-wise Adaptation for Speech Summarization.

[DOI]

,

,

,

Shinji Watanabe

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Pairwise Similarity Learning is SimPLE.

[DOI]

,

,

,

,

,

,

Michael J. Black

,

Bernhard Schölkopf

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards Noise-Tolerant Speech-Referring Video Object Segmentation: Bridging Speech and Text.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Token Prediction as Implicit Classification to Identify LLM-Generated Text.

[DOI]

,

,

,

,

,

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems.

[DOI]

Roshan S. Sharma

,

,

,

,

,

Shinji Watanabe

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Describing emotions with acoustic property prompts for speech emotion recognition.

[DOI]

,

Benjamin Elizalde

,

,

,

,

CoRR, 2022

Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition.

[DOI]

,

,

,

CoRR, 2022

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction.

[DOI]

,

,

,

,

,

CoRR, 2022

On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice.

[DOI]

,

,

,

,

CoRR, 2022

Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection.

[DOI]

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SphereFace2: Binary Classification is All You Need for Deep Face Recognition.

[DOI]

,

,

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

An Overview of Techniques for Biomarker Discovery in Voice Signal.

[DOI]

,

,

CoRR, 2021

Detection and Evaluation of Human and Machine Generated Speech in Spoofing Attacks on Automatic Speaker Verification Systems.

[DOI]

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Masked Proxy Loss for Text-Independent Speaker Verification.

[DOI]

,

Aiswarya Vinod Kumar

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Generalized Spoofing Detection Inspired from Audio Generation Artifacts.

[DOI]

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks.

[DOI]

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Self-Supervised 3D Face Reconstruction via Conditional Estimation.

[DOI]

,

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Detection of Covid-19 Through the Analysis of Vocal Fold Oscillations.

[DOI]

Mahmoud Al Ismail

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Interpreting Glottal Flow Dynamics for Detecting Covid-19 From Voice.

[DOI]

,

Mahmoud Al Ismail

,

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Mask Proxy Loss for Text-Independent Speaker Recognition.

[DOI]

,

Aiswarya Vinod Kumar

,

,

,

CoRR, 2020

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection.

[DOI]

,

,

CoRR, 2020

Controlled AutoEncoders to Generate Faces from Voices.

[DOI]

,

,

,

,

Proceedings of the Advances in Visual Computing - 15th International Symposium, 2020

Hide and Speak: Towards Deep Neural Networks for Speech Steganography.

[DOI]

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The Phonetic Bases of Vocal Expressed Emotion: Natural versus Acted.

[DOI]

,

Shahan Ali Memon

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Hierarchical Routing Mixture of Experts.

[DOI]

,

,

Shahan Ali Memon

,

,

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Artificial Creative Intelligence: Breaking the Imitation Barrier.

[DOI]

,

Roger B. Dannenberg

,

,

Proceedings of the Eleventh International Conference on Computational Creativity, 2020

Speech-Based Parameter Estimation of an Asymmetric Vocal Fold Oscillation Model and its Application in Discriminating Vocal Fold Pathologies.

[DOI]

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Detecting gender differences in perception of emotion in crowdsourced data.

[DOI]

Shahan Ali Memon

,

,

,

,

Vijaykumar Palat

,

,

,

,

CoRR, 2019

Non-Determinism in Neural Networks for Adversarial Robustness.

[DOI]

Daanish Ali Khan

,

,

,

,

Abelino Jimenez

,

,

CoRR, 2019

Reconstructing faces from voices.

[DOI]

,

,

CoRR, 2019

Hide and Speak: Deep Neural Networks for Speech Steganography.

[DOI]

,

,

,

,

CoRR, 2019

Face Reconstruction from Voice using Generative Adversarial Networks.

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Neural Regression Trees.

[DOI]

Shahan Ali Memon

,

,

,

Proceedings of the International Joint Conference on Neural Networks, 2019

Disjoint Mapping Network for Cross-modal Matching of Voices and Faces.

[DOI]

,

Mahmoud Al Ismail

,

,

,

Proceedings of the 7th International Conference on Learning Representations, 2019

Human Behaviour Recognition Using Wifi Channel State Information.

[DOI]

Daanish Ali Khan

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Optimizing Neural Network Embeddings Using a Pair-Wise Loss for Text-Independent Speaker Verification.

[DOI]

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Optimal Strategies for Matching and Retrieval Problems by Comparing Covariates.

[DOI]

,

Mahmoud Al Ismail

,

,

CoRR, 2018

A Corrective Learning Approach for Text-Independent Speaker Verification.

[DOI]

,

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Voice Impersonation Using Generative Adversarial Networks.

[DOI]

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Voice disguise by mimicry: deriving statistical articulometric evidence to evaluate claimed impersonation.

[DOI]

,

Abelino Jiménez

,

IET Biom., 2017

Speaker identification from the sound of the human breath.

[DOI]

,

,

CoRR, 2017

Deducing the severity of psychiatric symptoms from the human voice.

[DOI]

,

Justin T. Baker

,

Luciana Pennant

,

Louis-Philippe Morency

CoRR, 2017

Supervised monaural source separation based on autoencoders.

[DOI]

,

,

,

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Minimizing Free Energy of Stochastic Functions of Markov Chains.

[DOI]

Proceedings of the Recent Advances in Nonlinear Speech Processing, 2016

Content-based Video Indexing and Retrieval Using Corr-LDA.

[DOI]

Rahul Radhakrishnan Iyer

,

,

Vikas Mohandoss

,

,

,

CoRR, 2016

Mereological algebras as mechanisms for reasoning about sounds.

[DOI]

Proceedings of the 26th IEEE International Workshop on Machine Learning for Signal Processing, 2016

Estimating multiple physical parameters from speech data.

[DOI]

Shareef Babu Kalluri

,

Ashwin Vijayakumar

,

Deepu Vijayasenan

,

Proceedings of the 26th IEEE International Workshop on Machine Learning for Signal Processing, 2016

Forensic anthropometry from voice: An articulatory-phonetic approach.

[DOI]

,

,

Proceedings of the 39th International Convention on Information and Communication Technology, 2016

Short-term analysis for estimating physical parameters of speakers.

[DOI]

,

,

Proceedings of the 4th International Conference on Biometrics and Forensics, 2016

Formant manipulations in voice disguise by mimicry.

[DOI]

,

,

Proceedings of the 4th International Conference on Biometrics and Forensics, 2016

Estimation of Children's Physical Characteristics from Their Voices.

[DOI]

Jill Fain Lehman

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The relationship of voice onset time and Voice Offset Time to physical age.

[DOI]

,

,

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Complex recurrent neural networks for denoising speech signals.

[DOI]

,

,

Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

CMU Informedia@TRECVID 2015: MED/SIN/LNK/SED.

[DOI]

Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Keyword spotting in multi-player voice driven games for children.

[DOI]

Sundar Harshavardhan

,

Jill Fain Lehman

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Free energy for speech recognition.

[DOI]

,

Ken'ichi Kumatani

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Informedia @ TRECVID 2014.

[DOI]

Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Online word-spotting in continuous speech with recurrent neural networks.

[DOI]

Pallavi Baljekar

,

Jill Fain Lehman

,

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Audio Classification with Thermodynamic Criteria.

[DOI]

Proceedings of the 2014 IEEE International Conference on Cloud Engineering, 2014

Detecting sound objects in audio recordings.

[DOI]

,

,

Proceedings of the 22nd European Signal Processing Conference, 2014

2013

Informedia@TRECVID 2013.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Richard M. Stern

,

Teruko Mitamura

,

,

Alexander G. Hauptmann

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Discriminatively trained dependency language modeling for conversational speech recognition.

[DOI]

Benjamin Lambert

,

,

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Doppler based speed estimation of vehicles using passive sensor.

[DOI]

Shubhranshu Barnwal

,

,

Rajesh M. Hegde

,

,

Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2013

Joint constrained maximum likelihood regression for overlapping speech recognition.

[DOI]

Ken'ichi Kumatani

,

,

Friedrich Faubel

,

John W. McDonough

,

Proceedings of the IEEE International Conference on Acoustics, 2013

Event detection in short duration audio using Gaussian Mixture Model and Random Forest Classifier.

[DOI]

,

Rajesh M. Hegde

,

,

Proceedings of the 21st European Signal Processing Conference, 2013

2012

Informedia @TRECVID 2012.

[DOI]

Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

A signal-separation-based array postfilter for distant speech recognition.

[DOI]

,

Ken'ichi Kumatani

,

John W. McDonough

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Language identification using spectro-temporal patch features.

[DOI]

,

,

,

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition.

[DOI]

Ken'ichi Kumatani

,

,

,

John W. McDonough

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation.

[DOI]

,

Indradyumna Roy

,

Tarunima Prabhakar

,

,

Sourish Chaudhuri

,

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Exploiting Temporal Sequence Structure for Semantic Analysis of Multimedia.

[DOI]

Sourish Chaudhuri

,

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Compensating for denoising artifacts.

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Audio event detection from acoustic unit occurrence patterns.

[DOI]

,

,

,

Sourish Chaudhuri

,

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Spectrographic seam patterns for discriminative word spotting.

[DOI]

Shubhranshu Barnwal

,

,

,

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Microphone array processing for distant speech recognition: Towards real-world deployment.

[DOI]

Ken'ichi Kumatani

,

Takayuki Arakawa

,

Kazumasa Yamamoto

,

John W. McDonough

,

,

,

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Introduction.

[DOI]

Tuomas Virtanen

,

,

Proceedings of the Techniques for Noise Robustness in Automatic Speech Recognition, 2012

The Basics of Automatic Speech Recognition.

[DOI]

,

,

Tuomas Virtanen

Proceedings of the Techniques for Noise Robustness in Automatic Speech Recognition, 2012

The Problem of Robustness in Automatic Speech Recognition.

[DOI]

,

Tuomas Virtanen

,

Proceedings of the Techniques for Noise Robustness in Automatic Speech Recognition, 2012

2011

Phoneme-Dependent NMF for Speech Enhancement in Monaural Mixtures.

[DOI]

,

,

Tuomas Virtanen

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A paired test for recognizer selection with untranscribed data.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2011

Gammatone sub-band magnitude-domain dereverberation for ASR.

[DOI]

,

,

,

Richard M. Stern

Proceedings of the IEEE International Conference on Acoustics, 2011

An iterative least-squares technique for dereverberation.

[DOI]

,

,

,

Richard M. Stern

Proceedings of the IEEE International Conference on Acoustics, 2011

Reconstructing Noise-Corrupted Spectrographic Components for Robust Speech Recognition.

[DOI]

,

Proceedings of the Robust Speech Recognition of Uncertain or Missing Data, 2011

2010

The use of sense in unsupervised training of acoustic models for ASR systems.

[DOI]

,

Benjamin Lambert

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Non-negative matrix factorization based compensation of music for automatic speech recognition.

[DOI]

,

Tuomas Virtanen

,

Sourish Chaudhuri

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Creating a linguistic plausibility dataset with non-expert annotators.

[DOI]

Benjamin Lambert

,

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Latent-variable decomposition based dereverberation of monaural and multi-channel signals.

[DOI]

,

,

Paris Smaragdis

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

A joint decoding algorithm for multiple-example-based addition of words to a pronunciation lexicon.

[DOI]

Dhananjay Bansal

,

Nishanth Ulhas Nair

,

,

Proceedings of the IEEE International Conference on Acoustics, 2009

2007

Probabilistic deduction of symbol mappings for extension of lexicons.

[DOI]

,

Evandro B. Gouvêa

,

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Bandwidth Expansionwith a pólya URN Model.

[DOI]

,

,

Madhusudana V. S. Shashanka

,

Paris Smaragdis

Proceedings of the IEEE International Conference on Acoustics, 2007

2005

Voice driven applications in non-stationary and chaotic environment.

[DOI]

,

,

,

,

,

,

,

Richard M. Stern

Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2005

Recognizing speech from simultaneous speakers.

[DOI]

,

,

Paris Smaragdis

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Feature compensation with secondary sensor measurements for robust speech recognition.

[DOI]

,

Proceedings of the 13th European Signal Processing Conference, 2005

2004

Classification in Likelihood Spaces.

[DOI]

,

Technometrics, 2004

Maximum - likelihod adaptation of semi-continuous HMMs by latent variable decomposition of state distributions.

[DOI]

,

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

On tracking noise with linear dynamical system models.

[DOI]

,

,

Richard M. Stern

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Classifier-based non-linear projection for adaptive endpointing of continuous speech.

[DOI]

,

Comput. Speech Lang., 2003

Classification with free energy at raised temperatures.

[DOI]

,

Manfred K. Warmuth

,

,

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Design of the CMU sphinx-4 decoder.

[DOI]

,

,

,

Evandro B. Gouvêa

,

,

,

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Tracking noise via dynamical systems with a continuum of states.

[DOI]

,

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Automatic generation of subword units for speech recognition systems.

[DOI]

,

,

Richard M. Stern

IEEE Trans. Speech Audio Process., 2002

Combining search spaces of heterogeneous recognizers for improved speech recogniton.

[DOI]

,

,

Richard M. Stern

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Rapid development of speech-to-speech translation systems.

[DOI]

,

,

Robert E. Frederking

,

,

,

Alexander I. Rudnicky

,

,

Eric Steinbrecher

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001

Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination.

[DOI]

,

Michael L. Seltzer

,

,

Richard M. Stern

Proceedings of the IEEE International Conference on Acoustics, 2001

Tandem acoustic modeling in large-vocabulary recognition.

[DOI]

Daniel P. W. Ellis

,

,

Proceedings of the IEEE International Conference on Acoustics, 2001

2000

Structured redefinition of sound units by merging and splitting for improved speech recognition.

[DOI]

,

,

Richard M. Stern

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Task and domain specific modelling in the Carnegie Mellon communicator system.

[DOI]

Alexander I. Rudnicky

,

Christina L. Bennett

,

,

Ananlada Chotimongkol

,

,

,

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Automatic subword unit refinement for spontaneous speech recognition via phone splitting.

[DOI]

,

,

Richard M. Stern

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Phone transition acoustic modeling: application to speaker independent and spontaneous speech systems.

[DOI]

,

,

Richard M. Stern

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Automatic generation of phone sets and lexical transcriptions.

[DOI]

,

,

Richard M. Stern

Proceedings of the IEEE International Conference on Acoustics, 2000

1999

Domain adduced state tying for cross-domain acoustic modelling.

[DOI]

,

,

Richard M. Stern

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Automatic clustering and generation of contextual questions for tied states in hidden Markov models.

[DOI]

,

,

Richard M. Stern

Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998

Inference of missing spectrographic features for robust speech recognition.

[DOI]

,

,

Richard M. Stern

Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Loading...