We stand with Ukraine

We stand with Ukraine

Najim Dehak

Orcid: 0000-0002-4489-5753

Affiliations:

MIT, Cambridge, USA

According to our database¹, Najim Dehak authored at least 220 papers between 2006 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2026

Interpretable Features for the Assessment of Neurodegenerative Diseases Through Handwriting Analysis.

[DOI]

,

,

,

Gabriel Chávez

,

Laureano Moro-Velázquez

,

Emile Moukhebeir

,

Ankur A. Butala

,

IEEE J. Biomed. Health Informatics, May, 2026

Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human Summaries of Conversational Speech.

[DOI]

Kaavya Chaparala

,

,

Jesús Antonio Villalba López

,

Laureano Moro-Velázquez

,

Peter Viechnicki

,

CoRR, May, 2026

GENFIG1: Visual Summaries of Scholarly Work as a Challenge for Vision-Language Models.

[DOI]

,

,

,

,

,

Daniel Khashabi

CoRR, April, 2026

DiT-Flow: Speech Enhancement Robust to Multiple Distortions based on Flow Matching in Latent Space and Diffusion Transformers.

[DOI]

,

,

,

Yuval Sieradzki

,

,

Laureano Moro-Velázquez

,

Jesús Villalba

,

,

,

CoRR, March, 2026

Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation.

[DOI]

,

,

Laureano Moro-Velázquez

,

Jesús Villalba

,

CoRR, March, 2026

Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec.

[DOI]

,

,

,

,

Shrikanth Narayanan

,

,

Laureano Moro-Velázquez

,

Jesús Villalba

,

CoRR, March, 2026

SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation.

[DOI]

,

,

,

,

,

,

,

,

CoRR, January, 2026

2025

Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization.

[DOI]

,

,

,

,

,

Laureano Moro-Velázquez

,

,

Jesús Villalba

CoRR, December, 2025

ReFESS-QI: Reference-Free Evaluation For Speech Separation With Joint Quality And Intelligibility Scoring.

[DOI]

,

,

,

,

Yuval Sieradzki

,

,

Jesús Villalba

,

,

CoRR, October, 2025

Latent Speech-Text Transformer.

[DOI]

,

,

,

Benjamin Muller

,

Jesús Villalba

,

,

Luke Zettlemoyer

,

,

,

Srinivasan Iyer

,

CoRR, October, 2025

Backdoor Attacks Against Speech Language Models.

[DOI]

Alexandrine Fortier

,

,

Jesús Villalba

,

,

Patrick Cardinal

CoRR, October, 2025

MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion With Increased Controllability via Multiple Guidances.

[DOI]

,

,

,

,

Laureano Moro-Velázquez

,

Jesús Villalba

,

CoRR, September, 2025

Cross-Corpus and Cross-domain Handwriting Assessment of NeuroDegenerative Diseases via Time-Series-to-Image Conversion.

[DOI]

Gabriel Chávez

,

Laureano Moro-Velázquez

,

Ankur A. Butala

,

,

CoRR, September, 2025

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech.

[DOI]

,

,

,

,

,

,

,

Laureano Moro-Velázquez

,

Jesús Villalba

,

,

Shrikanth Narayanan

,

Mounya Elhilali

,

CoRR, June, 2025

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline.

[DOI]

,

,

,

,

,

,

,

Laureano Moro-Velázquez

,

Jesús Villalba

,

CoRR, May, 2025

Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits.

[DOI]

,

,

,

,

Thanathai Lertpetchpun

,

,

,

,

Laureano Moro-Velázquez

,

,

,

Shrikanth Narayanan

CoRR, May, 2025

Time Scale Network: An Efficient Shallow Neural Network for Time Series Data in Biomedical Applications.

[DOI]

,

,

,

Laureano Moro-Velázquez

,

Pedro P. Irazoqui

IEEE J. Sel. Top. Signal Process., January, 2025

Joint Diarization and Separation Using SepFormer With Non-Autoregressive Attractors.

[DOI]

Magdalena Rybicka

,

Konrad Kowalczyk

,

,

,

Jesús Villalba

IEEE Signal Process. Lett., 2025

Deep Stroop: Integrating eye tracking and speech processing to characterize people with neurodegenerative disorders while performing neuropsychological tests.

[DOI]

,

,

,

Ankur A. Butala

,

,

Pedro P. Irazoqui

,

,

Laureano Moro-Velázquez

Comput. Biol. Medicine, 2025

Multimodal Emotion Diarization: Frame-Wise Integration of Text and Audio Representations.

[DOI]

,

,

Jesús Villalba

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Count Your Speakers! Multitask Learning for Multimodal Speaker Diarization.

[DOI]

,

Jesús Villalba

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

The Interspeech 2025 Challenge on Speech Emotion Recognition in Naturalistic Conditions.

[DOI]

Abinay Reddy Naini

,

Lucas Goncalves

,

,

,

Ismail Rasim Ulgen

,

,

Laureano Moro-Velázquez

,

Leibny Paola García

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

FaiST: A Benchmark Dataset for Fairness in Speech Technology.

[DOI]

,

,

Priyam Mazumdar

,

Zsuzsanna Fagyal

,

,

Jesús Villalba

,

Mark Hasegawa-Johnson

,

,

Laureano Moro-Velázquez

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

ADCeleb: A Longitudinal Speech Dataset from Public Figures for Early Detection of Alzheimer's Disease.

[DOI]

,

,

,

Laureano Moro-Velázquez

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.

[DOI]

,

,

,

,

Mounya Elhilali

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Detecting Neurodegenerative Diseases using Frame-Level Handwriting Embeddings.

[DOI]

,

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Unveiling Performance Bias in ASR Systems: A Study on Gender, Age, Accent, and More.

[DOI]

,

Priyam Mazumdar

,

,

Mark Hasegawa-Johnson

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation.

[DOI]

,

,

Laureano Moro-Velázquez

,

,

Jesús Villalba

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Demographic Attributes Prediction from Speech Using WavLM Embeddings.

[DOI]

,

,

Proceedings of the 59th Annual Conference on Information Sciences and Systems, 2025

The JHU-MIT System for NIST SRE24: Post-Evaluation Analysis.

[DOI]

Jesús Villalba

,

Jonas Borgstrom

,

,

Leibny Paola García

,

Pedro A. Torres-Carrasquillo

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM.

[DOI]

,

,

Matthew Wiesner

,

Peter Viechnicki

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Multi-Target Backdoor Attacks Against Speaker Recognition.

[DOI]

Alexandrine Fortier

,

,

,

Jesús Antonio Villalba López

,

,

Patrick Cardinal

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Automating the analysis of eye movement for different neurodegenerative disorders.

[DOI]

,

Ankur A. Butala

,

Laureano Moro-Velázquez

,

,

,

,

Jesús Villalba

,

Comput. Biol. Medicine, March, 2024

End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors.

[DOI]

Magdalena Rybicka

,

Jesús Villalba

,

,

,

Konrad Kowalczyk

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Time-Domain Speech Super-Resolution With GAN Based Modeling for Telephony Speaker Verification.

[DOI]

Saurabh Kataria

,

Jesús Villalba

,

Laureano Moro-Velázquez

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Slowness Regularized Contrastive Predictive Coding for Acoustic Unit Discovery.

[DOI]

Saurabhchand Bhati

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Explainable Metrics for the Assessment of Neurodegenerative Diseases through Handwriting Analysis.

[DOI]

,

,

,

Gabriel Chávez

,

Laureano Moro-Velázquez

,

Ankur A. Butala

,

CoRR, 2024

Clean Label Attacks Against SLU Systems.

[DOI]

Henry Li Xinyuan

,

,

,

Jesús Villalba

,

,

Sanjeev Khudanpur

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Unraveling Adversarial Examples against Speaker Identification - Techniques for Attack Detection and Victim Model Classification.

[DOI]

,

,

Jesús Villalba

,

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Odyssey 2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results.

[DOI]

Lucas Goncalves

,

,

Abinay Reddy Naini

,

Laureano Moro-Velázquez

,

,

,

,

,

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Discovering Invariant Patterns of Cognitive Decline Via an Automated Analysis of the Cookie Thief Picture Description Task.

[DOI]

,

,

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing.

[DOI]

,

,

,

Laureano Moro-Velázquez

,

,

,

Jesús Villalba

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Exploring the Complementary Nature of Speech and Eye Movements for Profiling Neurological Disorders.

[DOI]

,

,

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Noise-robust Speech Separation with Fast Generative Correction.

[DOI]

,

Jesús Villalba

,

Laureano Moro-Velázquez

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Leveraging Universal Speech Representations for Detecting and Assessing the Severity of Mild Cognitive Impairment Across Languages.

[DOI]

,

,

,

Laureano Moro-Velázquez

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Multimodal Emotion Recognition Harnessing the Complementarity of Speech, Language, and Vision.

[DOI]

,

,

,

,

,

Jesús Villalba

,

Laureano Moro-Velázquez

,

Proceedings of the 26th International Conference on Multimodal Interaction, 2024

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction.

[DOI]

,

,

,

,

,

Mounya Elhilali

Proceedings of the IEEE International Conference on Acoustics, 2024

Concurrent validity of instrumented insoles measuring gait and balance metrics in Parkinson's disease.

[DOI]

Sophia A. Watkinson

,

Anthony J. Anderson

,

,

,

Michael Gonzalez

,

Laureano Moro-Velázquez

,

,

,

Emile Moukheiber

,

,

Brittney C. Muir

,

Ankur A. Butala

,

Kimberly Kontson

Proceedings of the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2024

Finding Spoken Identifications: Using GPT-4 Annotation for an Efficient and Fast Dataset Creation Pipeline.

[DOI]

,

,

,

,

,

Zsuzsanna Fagyal

,

Odette Scharenborg

,

Mark Hasegawa-Johnson

,

Laureano Moro-Velázquez

,

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023

Interpretable speech features vs. DNN embeddings: What to use in the automatic assessment of Parkinson's disease in multi-lingual scenarios.

[DOI]

,

,

Ankur A. Butala

,

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

Comput. Biol. Medicine, November, 2023

Time Scale Network: A Shallow Neural Network For Time Series Data.

[DOI]

,

,

,

Laureano Moro-Velázquez

,

Pedro P. Irazoqui

CoRR, 2023

Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning.

[DOI]

Saurabhchand Bhati

,

Jesús Villalba

,

Laureano Moro-Velázquez

,

,

CoRR, 2023

Stabilized training of joint energy-based models and their practical applications.

[DOI]

,

,

,

Hynek Hermansky

,

Jesús Villalba

,

Laureano Moro-Velázquez

,

CoRR, 2023

DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model.

[DOI]

,

,

Jesús Villalba

,

,

,

,

Laureano Moro-Velázquez

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Self-FiLM: Conditioning GANs with self-supervised representations for bandwidth extension based speaker recognition.

[DOI]

Saurabh Kataria

,

Jesús Villalba

,

Laureano Moro-Velázquez

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora?

[DOI]

,

,

,

Jesús Villalba

,

Ankur A. Butala

,

,

Laureano Moro-Velázquez

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Segmental SpeechCLIP: Utilizing Pretrained Image-text Models for Audio-Visual Learning.

[DOI]

Saurabhchand Bhati

,

Jesús Villalba

,

Laureano Moro-Velázquez

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22.

[DOI]

Jesús Villalba

,

Jonas Borgstrom

,

,

Saurabh Kataria

,

Leibny Paola García

,

Pedro A. Torres-Carrasquillo

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Clustering Unsupervised Representations as Defense Against Poisoning Attacks on Speech Commands Classification System.

[DOI]

,

,

,

,

Jesús Villalba

,

Sanjeev Khudanpur

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Energy-Based Model for Robust Speech Classification System Against Dirty-Label Backdoor Poisoning Attacks.

[DOI]

,

,

,

,

Jesús Villalba

,

Sanjeev Khudanpur

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Model-Based Fairness Metric for Speaker Verification.

[DOI]

,

Laureano Moro-Velázquez

,

,

,

Jesús Villalba

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Unsupervised Speech Segmentation and Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding.

[DOI]

Saurabhchand Bhati

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

,

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Non-Contrastive Self-Supervised Learning for Utterance-Level Information Extraction From Speech.

[DOI]

,

Jesús Villalba

,

Laureano Moro-Velázquez

,

IEEE J. Sel. Top. Signal Process., 2022

Discovering phonetic inventories with crosslingual automatic speech recognition.

[DOI]

,

,

Laureano Moro-Velázquez

,

,

Saurabhchand Bhati

,

Odette Scharenborg

,

Mark Hasegawa-Johnson

,

Comput. Speech Lang., 2022

Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser.

[DOI]

,

Saurabh Kataria

,

,

,

Jesús Villalba

,

Sanjeev Khudanpur

,

CoRR, 2022

Code-Switching Text Augmentation for Multilingual Speech Processing.

[DOI]

,

Shammur Absar Chowdhury

,

,

,

CoRR, 2022

Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition.

[DOI]

,

Shammur Absar Chowdhury

,

,

,

,

Sanjeev Khudanpur

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

A Multi-Modal Array of Interpretable Features to Evaluate Language and Speech Patterns in Different Neurological Disorders.

[DOI]

,

,

,

Miguel Iglesias

,

Ankur A. Butala

,

,

Robert D. Stevens

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Vsameter: Evaluation of a New Open-Source Tool to Measure Vowel Space Area and Related Metrics.

[DOI]

,

Laureano Moro-Velázquez

,

,

Jesús Villalba

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Advances in Cross-Lingual and Cross-Source Audio-Visual Speaker Recognition: The JHU-MIT System for NIST SRE21.

[DOI]

Jesús Villalba

,

Bengt J. Borgstrom

,

Saurabh Kataria

,

Magdalena Rybicka

,

Carlos D. Castillo

,

,

L. Paola García-Perera

,

Pedro A. Torres-Carrasquillo

,

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Advances in Speaker Recognition for Multilingual Conversational Telephone Speech: The JHU-MIT System for NIST SRE20 CTS Challenge.

[DOI]

Jesús Villalba

,

Bengt J. Borgstrom

,

Saurabh Kataria

,

,

Pedro A. Torres-Carrasquillo

,

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Chunking Defense for Adversarial Attacks on ASR.

[DOI]

,

Jesús Villalba

,

,

Saurabh Kataria

,

Sanjeev Khudanpur

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors.

[DOI]

Magdalena Rybicka

,

Jesús Villalba

,

,

Konrad Kowalczyk

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification.

[DOI]

Saurabh Kataria

,

Jesús Villalba

,

Laureano Moro-Velázquez

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification.

[DOI]

,

Saurabh Kataria

,

Jesús Villalba

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.

[DOI]

,

Saurabh Kataria

,

,

,

Jesús Villalba

,

Sanjeev Khudanpur

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Non-contrastive self-supervised learning of utterance-level speech representations.

[DOI]

,

Raghavendra Pappagari

,

,

Laureano Moro-Velázquez

,

Jesús Villalba

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Study of Pre-Processing Defenses Against Adversarial Attacks on State-of-the-Art Speaker Recognition Systems.

[DOI]

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

,

IEEE Trans. Inf. Forensics Secur., 2021

What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition.

[DOI]

,

Raghavendra Pappagari

,

Trans. Assoc. Comput. Linguistics, 2021

Non-Autoregressive Transformer for Speech Recognition.

[DOI]

,

Shinji Watanabe

,

Jesús Villalba

,

,

IEEE Signal Process. Lett., 2021

The JHU submission to VoxSRC-21: Track 3.

[DOI]

,

Jesús Villalba

,

CoRR, 2021

Adversarial Attacks and Defenses for Speech Recognition Systems.

[DOI]

,

,

,

Jesús Villalba

,

,

,

Sanjeev Khudanpur

CoRR, 2021

Adversarial Attacks and Defenses for Speaker Identification Systems.

[DOI]

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

,

CoRR, 2021

Advances in Parkinson's Disease detection and assessment using voice and speech: A review of the articulatory and phonatory aspects.

[DOI]

Laureano Moro-Velázquez

,

Jorge Andrés Gómez García

,

Julián D. Arias-Londoño

,

,

Juan Ignacio Godino-Llorente

Biomed. Signal Process. Control., 2021

Invariant Representation Learning for Robust Far-Field Speaker Recognition.

[DOI]

Aviad Shtrosberg

,

Jesús Villalba

,

,

,

Proceedings of the Statistical Language and Speech Processing, 2021

Representation Learning to Classify and Detect Adversarial Attacks Against Speaker and Speech Recognition Systems.

[DOI]

Jesús Villalba

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Spine2Net: SpineNet with Res2Net and Time-Squeeze-and-Excitation Blocks for Speaker Recognition.

[DOI]

Magdalena Rybicka

,

Jesús Villalba

,

,

,

Konrad Kowalczyk

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios.

[DOI]

Raghavendra Pappagari

,

,

,

Laureano Moro-Velázquez

,

,

Jesús Villalba

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Deep Feature CycleGANs: Speaker Identity Preserving Non-Parallel Microphone-Telephone Domain Adaptation for Speaker Verification.

[DOI]

Saurabh Kataria

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.

[DOI]

,

,

,

,

Mohammad Norouzi

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition.

[DOI]

,

,

Laureano Moro-Velázquez

,

Jesús Villalba

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation.

[DOI]

Saurabhchand Bhati

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Align or attend? Toward More Efficient and Accurate Spoken Word Discovery Using Speech-to-Image Retrieval.

[DOI]

,

,

Mark Hasegawa-Johnson

,

Odette Scharenborg

,

Proceedings of the IEEE International Conference on Acoustics, 2021

CopyPaste: An Augmentation Method for Speech Emotion Recognition.

[DOI]

Raghavendra Pappagari

,

Jesús Villalba

,

,

Laureano Moro-Velázquez

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Perceptual Loss Based Speech Denoising with an Ensemble of Audio Pattern Recognition and Self-Supervised Models.

[DOI]

Saurabh Kataria

,

Jesús Villalba

,

Proceedings of the IEEE International Conference on Acoustics, 2021

How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.

[DOI]

,

,

Laureano Moro-Velázquez

,

,

Mark Hasegawa-Johnson

,

Odette Scharenborg

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios.

[DOI]

,

,

Jesús Villalba

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Focus on the Present: A Regularization Method for the ASR Source-Target Attention Layer.

[DOI]

,

,

Jesús Villalba

,

Proceedings of the IEEE International Conference on Acoustics, 2021

New tools for the differential evaluation of Parkinson's disease using voice and speech processing.

[DOI]

Laureano Moro-Velázquez

,

Jorge Gómez-García

,

,

Juan Ignacio Godino-Llorente

Proceedings of the Fifth International Conference, 2021

Beyond Isolated Utterances: Conversational Emotion Recognition.

[DOI]

Raghavendra Pappagari

,

,

Jesús Villalba

,

Laureano Moro-Velázquez

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Joint Prediction of Truecasing and Punctuation for Conversational Speech in Low-Resource Scenarios.

[DOI]

Raghavendra Pappagari

,

,

Agnieszka Mikolajczyk

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance.

[DOI]

Laureano Moro-Velázquez

,

Estefanía Hernández-García

,

Jorge Andrés Gómez García

,

Juan Ignacio Godino-Llorente

,

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Introduction to the Issue on Automatic Assessment of Health Disorders Based on Voice, Speech, and Language Processing.

[DOI]

Juan Ignacio Godino-Llorente

,

Douglas D. O'Shaughnessy

,

,

,

Claudia Manfredi

IEEE J. Sel. Top. Signal Process., 2020

State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations.

[DOI]

Jesús Villalba

,

,

,

Daniel Garcia-Romero

,

,

,

Jonas Borgstrom

,

Leibny Paola García-Perera

,

Fred Richardson

,

,

Pedro A. Torres-Carrasquillo

,

Comput. Speech Lang., 2020

rVAD: An unsupervised segment-based robust voice activity detection method.

[DOI]

,

Achintya Kumar Sarkar

,

Comput. Speech Lang., 2020

Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild.

[DOI]

Phani Sankar Nidadavolu

,

Saurabh Kataria

,

L. Paola García-Perera

,

Jesús Villalba

,

CoRR, 2020

Advances in Speaker Recognition for Telephone and Audio-Visual Data: the JHU-MIT Submission for NIST SRE19.

[DOI]

Jesús Antonio Villalba López

,

Daniel Garcia-Romero

,

,

,

Jonas Borgstrom

,

,

Leibny Paola García-Perera

,

Saurabh Kataria

,

Phani Sankar Nidadavolu

,

Pedro Torres-Carrasquiilo

,

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Analysis of Deep Feature Loss Based Enhancement for Speaker Verification.

[DOI]

Saurabh Kataria

,

Phani Sankar Nidadavolu

,

Jesús Villalba

,

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Speaker Detection in the Wild: Lessons Learned from JSALT 2019.

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Black-Box Attacks on Spoofing Countermeasures Using Transferability of Adversarial Examples.

[DOI]

,

,

Jesús Villalba

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages.

[DOI]

,

Laureano Moro-Velázquez

,

Mark Hasegawa-Johnson

,

Odette Scharenborg

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

x-Vectors Meet Adversarial Attacks: Benchmarking Adversarial Robustness in Speaker Verification.

[DOI]

Jesús Villalba

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Using State of the Art Speaker Recognition and Natural Language Processing Technologies to Detect Alzheimer's Disease and Assess its Severity.

[DOI]

Raghavendra Pappagari

,

,

Laureano Moro-Velázquez

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Learning Speaker Embedding from Text-to-Speech.

[DOI]

,

,

Jesús Villalba

,

Shinji Watanabe

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery.

[DOI]

Saurabhchand Bhati

,

Jesús Villalba

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

[DOI]

Lukasz Augustyniak

,

Piotr Szymanski

,

,

,

Adrian Szymczak

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

X-Vectors Meet Emotions: A Study On Dependencies Between Emotion and Speaker Recognition.

[DOI]

Raghavendra Pappagari

,

,

Jesús Villalba

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Unsupervised Feature Enhancement for Speaker Verification.

[DOI]

Phani Sankar Nidadavolu

,

Saurabh Kataria

,

Jesús Villalba

,

L. Paola García-Perera

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Using X-Vectors to Automatically Detect Parkinson's Disease from Speech.

[DOI]

Laureano Moro-Velázquez

,

Jesús Villalba

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Feature Enhancement with Deep Feature Losses for Speaker Verification.

[DOI]

Saurabh Kataria

,

Phani Sankar Nidadavolu

,

Jesús Villalba

,

,

L. Paola García-Perera

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition.

[DOI]

,

Shinji Watanabe

,

Jesús Villalba

,

CoRR, 2019

Speaker Sincerity Detection based on Covariance Feature Vectors and Ensemble Methods.

[DOI]

Mohammed Senoussaoui

,

Patrick Cardinal

,

,

Alessandro Lameiras Koerich

CoRR, 2019

A forced gaussians based methodology for the differential evaluation of Parkinson's Disease by means of speech processing.

[DOI]

Laureano Moro-Velázquez

,

Jorge Andrés Gómez García

,

Juan Ignacio Godino-Llorente

,

Jesús Villalba

,

,

Stefanie Shattuck-Hufnagel

,

Biomed. Signal Process. Control., 2019

Pretraining by Backtranslation for End-to-End ASR in Low-Resource Settings.

[DOI]

Matthew Wiesner

,

Adithya Renduchintala

,

Shinji Watanabe

,

,

,

Sanjeev Khudanpur

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18.

[DOI]

Jesús Villalba

,

,

,

Daniel Garcia-Romero

,

,

,

Jonas Borgstrom

,

Fred Richardson

,

,

François Grondin

,

,

Leibny Paola García-Perera

,

,

Pedro A. Torres-Carrasquillo

,

Sanjeev Khudanpur

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The JHU Speaker Recognition System for the VOiCES 2019 Challenge.

[DOI]

,

Jesús Villalba

,

,

,

,

,

Sanjeev Khudanpur

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

MCE 2018: The 1st Multi-Target Speaker Detection and Identification Challenge Evaluation.

[DOI]

,

,

Douglas A. Reynolds

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Emotion Identification Using Phone Posteriors in Raw Speech Waveform Based DNN.

[DOI]

,

Pegah Ghahremani

,

,

Nagendra Kumar Goel

,

Kandarpa Kumar Sarma

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Study of the Performance of Automatic Speech Recognition Systems in Speakers with Parkinson's Disease.

[DOI]

Laureano Moro-Velázquez

,

,

Shinji Watanabe

,

Mark A. Hasegawa-Johnson

,

Odette Scharenborg

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks.

[DOI]

,

,

Jesús Villalba

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Tied Mixture of Factor Analyzers Layer to Combine Frame Level Representations in Neural Speaker Embeddings.

[DOI]

,

Jesús Villalba

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Unsupervised Acoustic Segmentation and Clustering Using Siamese Network Embeddings.

[DOI]

Saurabhchand Bhati

,

,

K. Sri Rama Murty

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cycle-GANs for Domain Adaptation of Acoustic Features for Speaker Recognition.

[DOI]

Phani Sankar Nidadavolu

,

Jesús Villalba

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Investigation on Neural Bandwidth Extension of Telephone Speech for Improved Speaker Recognition.

[DOI]

Phani Sankar Nidadavolu

,

Vicente Iglesias

,

Jesús Villalba

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Attentive Filtering Networks for Audio Replay Attack Detection.

[DOI]

,

,

,

Junichi Yamagishi

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition.

[DOI]

,

Shinji Watanabe

,

,

Murali Karthick Baskar

,

Hirofumi Inaguma

,

Jesús Villalba

,

Proceedings of the IEEE International Conference on Acoustics, 2019

LSTM Siamese Network for Parkinson's Disease Detection from Speech.

[DOI]

Saurabhchand Bhati

,

Laureano Moro-Velázquez

,

Jesús Villalba

,

Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing, 2019

Bottom-Up Unsupervised Word Discovery via Acoustic Units.

[DOI]

Saurabhchand Bhati

,

,

Jesús Villalba

,

,

Sanjeev Khudanpur

,

Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing, 2019

Hierarchical Transformers for Long Document Classification.

[DOI]

Raghavendra Pappagari

,

,

Jesús Villalba

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-Gans.

[DOI]

Phani Sankar Nidadavolu

,

Saurabh Kataria

,

Jesús Villalba

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

NeuroSpeech.

[DOI]

Juan Rafael Orozco-Arroyave

,

Juan Camilo Vásquez-Correa

,

Jesús Francisco Vargas-Bonilla

,

,

,

Phani S. Nidadavolu

,

Heidi Christensen

,

,

,

Hamid R. Chinaei

,

,

,

,

,

,

SoftwareX, 2018

NeuroSpeech: An open-source software for Parkinson's speech analysis.

[DOI]

Juan Rafael Orozco-Arroyave

,

Juan Camilo Vásquez-Correa

,

Jesús Francisco Vargas-Bonilla

,

,

,

Phani S. Nidadavolu

,

Heidi Christensen

,

,

,

Hamid R. Chinaei

,

,

,

,

,

,

Digit. Signal Process., 2018

Low Resource Multi-modal Data Augmentation for End-to-end ASR.

[DOI]

Matthew Wiesner

,

Adithya Renduchintala

,

Shinji Watanabe

,

,

,

Sanjeev Khudanpur

CoRR, 2018

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System.

[DOI]

,

,

Douglas A. Reynolds

,

CoRR, 2018

The JHU Speech LOREHLT 2017 System: Cross-Language Transfer for Situation-Frame Detection.

[DOI]

Matthew Wiesner

,

,

,

,

,

,

Zhongqiang Huang

,

Sanjeev Khudanpur

,

CoRR, 2018

Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson's Disease.

[DOI]

Laureano Moro-Velázquez

,

Jorge Andrés Gómez García

,

Juan Ignacio Godino-Llorente

,

Jesús Villalba

,

Juan Rafael Orozco-Arroyave

,

Appl. Soft Comput., 2018

Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks.

[DOI]

Rubén Zazo-Candil

,

Phani Sankar Nidadavolu

,

,

Joaquin Gonzalez-Rodriguez

,

IEEE Access, 2018

Building an ASR System for Mboshi Using A Cross-Language Definition of Acoustic Units Approach.

[DOI]

Odette Scharenborg

,

,

Mark Hasegawa-Johnson

,

Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

Low-Resource Contextual Topic Identification on Speech.

[DOI]

,

Matthew Wiesner

,

Shinji Watanabe

,

,

,

,

Sanjeev Khudanpur

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

The MIT Lincoln Laboratory / JHU / EPITA-LSE LRE17 System.

[DOI]

Fred Richardson

,

Pedro A. Torres-Carrasquillo

,

Jonas Borgstrom

,

Douglas E. Sturim

,

,

Jesús Villalba

,

,

,

,

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

End-to-End versus Embedding Neural Networks for Language Recognition in Mismatched Conditions.

[DOI]

Jesús Antonio Villalba López

,

,

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Punctuation Prediction Model for Conversational Speech.

[DOI]

,

Piotr Szymanski

,

,

Adrian Szymczak

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages.

[DOI]

Matthew Wiesner

,

,

,

,

,

,

Zhongqiang Huang

,

,

Sanjeev Khudanpur

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.

[DOI]

,

,

,

Daniel Garcia-Romero

,

Jesús Villalba

,

Matthew Maciejewski

,

,

,

,

Shinji Watanabe

,

Sanjeev Khudanpur

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Visualizing Phoneme Category Adaptation in Deep Neural Networks.

[DOI]

Odette Scharenborg

,

Sebastian Tiesmeyer

,

Mark Hasegawa-Johnson

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Emotion Identification from Raw Speech Signals Using DNNs.

[DOI]

,

Pegah Ghahremani

,

,

Nagendra Kumar Goel

,

Kandarpa Kumar Sarma

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Investigation on Bandwidth Extension for Speaker Recognition.

[DOI]

Phani Sankar Nidadavolu

,

,

Jesús Villalba

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

End-to-end Deep Neural Network Age Estimation.

[DOI]

Pegah Ghahremani

,

Phani Sankar Nidadavolu

,

,

Jesús Villalba

,

,

Sanjeev Khudanpur

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Effectiveness of Single-Channel BLSTM Enhancement for Language Identification.

[DOI]

Peter Sibbern Frederiksen

,

Jesús Villalba

,

Shinji Watanabe

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts.

[DOI]

,

Raghavendra Pappagari

,

,

Jesús Villalba

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

An Investigation of Non-linear i-vectors for Speaker Verification.

[DOI]

,

Jesús Villalba

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Joint Verification-Identification in end-to-end Multi-Scale CNN Framework for Topic Identification.

[DOI]

Raghavendra Pappagari

,

Jesús Villalba

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Characterizing Performance of Speaker Diarization Systems on Far-Field Speech Using Standard Methods.

[DOI]

Matthew Maciejewski

,

,

,

,

Sanjeev Khudanpur

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Measuring Uncertainty in Deep Regression Models: The Case of Age Estimation from Speech.

[DOI]

,

Jesús Villalba

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

JHU Diarization System Description.

[DOI]

,

L. Paola García-Perera

,

Jesús Villalba

,

,

Proceedings of the Fourth International Conference, 2018

Study of the Automatic Detection of Parkison's Disease Based on Speaker Recognition Technologies and Allophonic Distillation.

[DOI]

Laureano Moro-Velázquez

,

Jorge Andrés Gómez García

,

Juan Ignacio Godino-Llorente

,

,

,

Francisco Grandas

,

José-Miguel Velazquez

,

Juan Rafael Orozco-Arroyave

,

,

Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2018

2017

Language Independent Assessment of Motor Impairments of Patients with Parkinson's Disease Using i-Vectors.

[DOI]

Nicanor García

,

Juan Camilo Vásquez-Correa

,

Juan Rafael Orozco-Arroyave

,

,

Proceedings of the Text, Speech, and Dialogue - 20th International Conference, 2017

Tied Variational Autoencoder Backends for i-Vector Speaker Recognition.

[DOI]

Jesús Villalba

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation System.

[DOI]

Pedro A. Torres-Carrasquillo

,

Fred Richardson

,

Shahan C. Nercessian

,

Douglas E. Sturim

,

William M. Campbell

,

,

,

,

Sri Harish Reddy Mallidi

,

Phani Sankar Nidadavolu

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Evaluation of the Neurological State of People with Parkinson's Disease Using i-Vectors.

[DOI]

Nicanor García

,

Juan Rafael Orozco-Arroyave

,

Luis Fernando D'Haro

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multi-view representation learning via gcca for multimodal analysis of Parkinson's disease.

[DOI]

Juan Camilo Vásquez-Correa

,

Juan Rafael Orozco-Arroyave

,

,

,

,

Heidi Christensen

,

,

,

,

Hamid R. Chinaei

,

,

Phani Sankar Nidadavolu

,

,

,

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

An empirical evaluation of zero resource acoustic unit discovery.

[DOI]

,

,

,

Santosh Kesiraju

,

,

,

Pegah Ghahremani

,

,

,

Sanjeev Khudanpur

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Topic identification of spoken documents using unsupervised acoustic unit discovery.

[DOI]

Santosh Kesiraju

,

Raghavendra Pappagari

,

,

,

,

Sanjeev Khudanpur

,

,

Suryakanth V. Gangashetty

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

On the Use of Acoustic Unit Discovery for Language Recognition.

[DOI]

Stephen H. Shum

,

David F. Harwath

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2016

The MITLL NIST LRE 2015 Language Recognition System.

[DOI]

Pedro A. Torres-Carrasquillo

,

,

Elizabeth Godoy

,

Douglas A. Reynolds

,

Fred Richardson

,

,

,

Douglas E. Sturim

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

I-Vector Representation Based on GMM and DNN for Audio Classification.

[DOI]

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Native Language Detection Using the I-Vector Framework.

[DOI]

Mohammed Senoussaoui

,

Patrick Cardinal

,

,

Alessandro L. Koerich

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Exploiting Hidden-Layer Responses of Deep Neural Networks for Language Recognition.

[DOI]

,

Sri Harish Reddy Mallidi

,

,

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Automatic Dialect Detection in Arabic Broadcast Speech.

[DOI]

,

,

Patrick Cardinal

,

,

Sree Harsha Yella

,

,

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Deep Neural Network Approaches to Speaker and Language Recognition.

[DOI]

Fred Richardson

,

Douglas A. Reynolds

,

IEEE Signal Process. Lett., 2015

ETS System for AV+EC 2015 Challenge.

[DOI]

Patrick Cardinal

,

,

Alessandro Lameiras Koerich

,

,

Patrice Boucher

Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, 2015

A unified deep neural network for speaker and language recognition.

[DOI]

Fred Richardson

,

Douglas A. Reynolds

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speaker adaptation using the i-vector technique for bottleneck features.

[DOI]

Patrick Cardinal

,

,

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Non-Negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition.

[DOI]

Mohamad Hasan Bahari

,

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2014

A complete KALDI recipe for building Arabic speech recognition systems.

[DOI]

,

,

Patrick Cardinal

,

,

,

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

GMM Weights Adaptation Based on Subspace Approaches for Speaker Verification.

[DOI]

,

,

Mohamad Hasan Bahari

,

,

,

Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Limited labels for unlimited data: active learning for speaker recognition.

[DOI]

Stephen H. Shum

,

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera.

[DOI]

Patrick Cardinal

,

,

,

,

,

,

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach.

[DOI]

,

,

,

IEEE Trans. Speech Audio Process., 2013

New cosine similarity scorings to implement gender-independent speaker verification.

[DOI]

Mohammed Senoussaoui

,

,

Pierre Dumouchel

,

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Bayesian distance metric learning on i-vector for speaker verification.

[DOI]

,

,

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Developing a speaker identification system for the DARPA RATS project.

[DOI]

,

Spyros Matsoukas

,

,

,

,

,

,

Hynek Hermansky

,

Sri Harish Reddy Mallidi

,

,

Richard M. Schwartz

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

The MITLL NIST LRE 2011 language recognition system.

[DOI]

,

Pedro A. Torres-Carrasquillo

,

Douglas A. Reynolds

,

,

Fred Richardson

,

,

Douglas E. Sturim

Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

First attempt of boltzmann machines for speaker verification.

[DOI]

Mohammed Senoussaoui

,

,

,

,

Pierre Dumouchel

Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

On the Use of Spectral and Iterative Methods for Speaker Diarization.

[DOI]

,

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Patrol Team Language Identification System for DARPA RATS P1 Evaluation.

[DOI]

,

,

,

,

Luis Fernando D'Haro

,

,

Frantisek Grézl

,

,

Spyros Matsoukas

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

Front-End Factor Analysis for Speaker Verification.

[DOI]

,

,

,

Pierre Dumouchel

,

IEEE Trans. Speech Audio Process., 2011

Exploiting Intra-Conversation Variability for Speaker Diarization.

[DOI]

,

,

Ekapol Chuangsuwanich

,

Douglas A. Reynolds

,

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Language Recognition via i-vectors and Dimensionality Reduction.

[DOI]

,

Pedro A. Torres-Carrasquillo

,

Douglas A. Reynolds

,

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

The MIT LL 2010 speaker recognition evaluation system: Scalable language-independent speaker recognition.

[DOI]

Douglas E. Sturim

,

William M. Campbell

,

,

,

,

Douglas A. Reynolds

,

Fred Richardson

,

Pedro A. Torres-Carrasquillo

,

Proceedings of the IEEE International Conference on Acoustics, 2011

Towards reduced false-alarms using cohorts.

[DOI]

,

William M. Campbell

,

Proceedings of the IEEE International Conference on Acoustics, 2011

A channel-blind system for speaker verification.

[DOI]

,

,

Douglas A. Reynolds

,

,

William M. Campbell

,

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification.

[DOI]

,

,

,

Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech.

[DOI]

Mohammed Senoussaoui

,

,

,

Pierre Dumouchel

Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Cosine Similarity Scoring without Score Normalization Techniques.

[DOI]

,

,

,

Douglas A. Reynolds

,

Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

2009

Cepstral and long-term features for emotion recognition.

[DOI]

Pierre Dumouchel

,

,

,

,

Narjès Boufaden

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification.

[DOI]

,

,

,

,

,

Pierre Dumouchel

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Comparison of scoring methods used in speaker recognition with Joint Factor Analysis.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2009

Support vector machines and Joint Factor Analysis for speaker verification.

[DOI]

,

,

,

,

Pierre Dumouchel

,

,

Valiantsina Hubeika

,

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

A Study of Interspeaker Variability in Speaker Verification.

[DOI]

,

,

,

,

Pierre Dumouchel

IEEE Trans. Speech Audio Process., 2008

The role of speaker factors in the NIST extended data task.

[DOI]

,

,

,

,

Pierre Dumouchel

Proceedings of the Odyssey 2008: The Speaker and Language Recognition Workshop, 2008

Kernel combination for SVM speaker verification.

[DOI]

,

,

,

Pierre Dumouchel

Proceedings of the Odyssey 2008: The Speaker and Language Recognition Workshop, 2008

Comparison between factor analysis and GMM support vector machines for speaker verification.

[DOI]

,

,

,

Pierre Dumouchel

Proceedings of the Odyssey 2008: The Speaker and Language Recognition Workshop, 2008

Development of the primary CRIM system for the NIST 2008 speaker recognition evaluation.

[DOI]

,

,

,

,

Pierre Dumouchel

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007

Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification.

[DOI]

,

Pierre Dumouchel

,

IEEE Trans. Speech Audio Process., 2007

Continuous prosodic features and formant modeling with joint factor analysis for speaker verification.

[DOI]

,

,

Pierre Dumouchel

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Linear and non linear kernel GMM supervector machines for speaker verification.

[DOI]

,

,

,

Pierre Dumouchel

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

2006

Support Vector Gmms for Speaker Verification.

[DOI]

,

Gérard Chollet

Proceedings of the Odyssey 2006: The Speaker and Language Recognition Workshop, 2006

GMM-based SVM for face recognition.

[DOI]

,

,

Gérard Chollet

Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Loading...