Daniel Garcia-Romero

Seyed Omid Sadjadi

Srikanth Vishnubhotla

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Zero-resource Speech Translation and Recognition with LLMs.

[BibT_eX]

[DOI]

Veera Raghavendra Elluru

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Hyper-adapter for Parameter-Efficient Multilingual ASR Adaptation.

[BibT_eX]

[DOI]

Zejiang Hou

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

The VoxCeleb Speaker Recognition Challenge: A Retrospective.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models.

[BibT_eX]

[DOI]

Sai Muralidhar Jayanthi

Srikanth Vishnubhotla

Sundararajan Srinivasan

Katrin Kirchhoff

CoRR, 2024

Revisiting Convolution-free Transformer for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SpeechGuard: Exploring the Adversarial Robustness of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Sai Muralidhar Jayanthi

Srikanth Vishnubhotla

Sundararajan Srinivasan

Katrin Kirchhoff

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

VoxWatch: An open-set speaker recognition benchmark on VoxCeleb.

[BibT_eX]

[DOI]

Seyed Omid Sadjadi

CoRR, 2023

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Directed speech separation for automatic speech recognition of long form conversational speech.

[BibT_eX]

[DOI]

Rohit Paturi

Sundararajan Srinivasan

Katrin Kirchhoff

Leibny Paola García-Perera

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Recent Developments on Espnet Toolkit Boosted By Conformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations.

[BibT_eX]

[DOI]

Fred Richardson

Réda Dehak

Pedro A. Torres-Carrasquillo

Najim Dehak

Comput. Speech Lang., 2020

Advances in Speaker Recognition for Telephone and Audio-Visual Data: the JHU-MIT Submission for NIST SRE19.

[BibT_eX]

[DOI]

Jesús Antonio Villalba López

Leibny Paola García-Perera

Saurabh Kataria

Phani Sankar Nidadavolu

Pedro Torres-Carrasquiilo

Najim Dehak

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition.

[BibT_eX]

[DOI]

Leibny Paola García-Perera

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Jhu-HLTCOE System for the Voxsrc Speaker Recognition Challenge.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18.

[BibT_eX]

[DOI]

Daniel Povey

Pedro A. Torres-Carrasquillo

Sanjeev Khudanpur

Najim Dehak

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker Recognition Benchmark Using the CHiME-5 Corpus.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Script Identification using Across- and Within-Image Distribution Estimation.

[BibT_eX]

[DOI]

Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Speaker Recognition for Multi-speaker Conversations Using X-vectors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Spoken Language Recognition using X-vectors.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

X-Vectors: Robust DNN Embeddings for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Audio-Visual Person Recognition in Multimedia Data From the Iarpa Janus Program.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Deep Neural Network Embeddings for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Speaker diarization using deep neural network embeddings.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Deep neural network-based speaker embeddings for end-to-end speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Summary of the 2015 NIST Language Recognition i-Vector Machine Learning Challenge.

[BibT_eX]

[DOI]

Jaime Hernandez-Cordero

Lisa P. Mason

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE15.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Priors for Speaker Counting and Diarization with AHC.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Stacked Long-Term TDNN for Spoken Language Recognition.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Speaker diarization with i-vectors from DNN senone posteriors.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

DNN senone MAP multinomial i-vectors for phonotactic language recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Insights into deep neural networks for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Analysis of the second phase of the 2013-2014 i-vector machine learning challenge.

[BibT_eX]

[DOI]

Jaime Hernandez-Cordero

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Content-based recommender systems for spoken documents.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Diarization resegmentation in the factor analysis subspace.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Topic Identification and Discovery on Text and Speech.

[BibT_eX]

[DOI]

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Time delay deep neural network-based universal background models for speaker recognition.

[BibT_eX]

[DOI]

David Snyder

Daniel Povey

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Speaker diarization with plda i-vector scoring and unsupervised calibration.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Improving speaker recognition performance in the domain adaptation challenge using deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

The NIST 2014 Speaker Recognition i-vector Machine Learning Challenge.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Unsupervised Domain Adaptation for I-Vector Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challenge.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Unsupervised idiolect discovery for speaker recognition.

[BibT_eX]

[DOI]

Aren Jansen

Pascal Clark

Jaime Hernandez-Cordero

Proceedings of the IEEE International Conference on Acoustics, 2014

Supervised domain adaptation for I-vector based speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Generative modelling for unsupervised score calibration.

[BibT_eX]

[DOI]

Niko Brümmer

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

A Symmetric Kernel Partial Least Squares Framework for Speaker Recognition.

[BibT_eX]

[DOI]

Balaji Vasan Srinivasan

IEEE Trans. Speech Audio Process., 2013

Subspace-constrained supervector PLDA for speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

2012

Robust speaker Recognition based on Latent variable Models.

[BibT_eX]

[DOI]

PhD thesis, 2012

Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

The UMD-JHU 2011 speaker recognition system.

[BibT_eX]

[DOI]

Garimella S. V. S. Sivaram

Xinhui Zhou

Dmitry N. Zotkin

Balaji Vasan Srinivasan

Yuancheng Luo

Sriram Ganapathy

Samuel Thomas

Sridhar Krishna Nemala

Majid Mirbagheri

Sri Harish Reddy Mallidi

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition.

[BibT_eX]

[DOI]

Xinhui Zhou

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Automatic Speech Codec Identification with Applications to Tampering Detection of Speech Recordings.

[BibT_eX]

[DOI]

Jingting Zhou

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Kernel Partial Least Squares for Speaker Recognition.

[BibT_eX]

[DOI]

Balaji Vasan Srinivasan

Dmitry N. Zotkin

Ramani Duraiswami

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Analysis of i-vector Length Normalization in Speaker Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Linear versus mel frequency cepstral coefficients for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Joint Factor Analysis for Speaker Recognition Reinterpreted as Signal Coding Using Overcomplete Dictionaries.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Automatic acquisition device identification from speech recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2008

Language and genre detection in audio content analysis.

[BibT_eX]

[DOI]

Vikramjit Mitra

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Intersession variability in speaker recognition: a behind the scene analysis.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Language detection in audio content analysis.

[BibT_eX]

[DOI]

Vikramjit Mitra

Proceedings of the IEEE International Conference on Acoustics, 2008

2006

Using quality measures for multilevel speaker recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2006

2005

Adapted user-dependent multimodal biometric authentication exploiting general information.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2005

Bayesian adaptation for user-dependent multimodal biometric authentication.

[BibT_eX]

[DOI]

Pattern Recognit., 2005

Speaker Verification Using Adapted User-Dependent Multilevel Fusion.

[BibT_eX]

[DOI]

Proceedings of the Multiple Classifier Systems, 6th International Workshop, 2005

2004

On the use of quality measures for text-independent speaker recognition.

[BibT_eX]

[DOI]

Joaquín González-Rodríguez

Proceedings of the Odyssey 2004: The Speaker and Language Recognition Workshop, Toledo, Spain, May 31, 2004

Exploiting general knowledge in user-dependent fusion strategies for multimodal biometric verification.

[BibT_eX]

[DOI]

Joaquín González-Rodríguez

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Robust likelihood ratio estimation in Bayesian forensic speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Support vector machine fusion of idiolectal and acoustic speaker information in Spanish conversational speech.

[BibT_eX]

[DOI]

Joaquín González-Rodríguez

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

U-NORM Likelihood Normalization in PIN-Based Speaker Verification Systems.

[BibT_eX]

[DOI]

Proceedings of the Audio-and Video-Based Biometrie Person Authentication, 2003

A Comparative Evaluation of Fusion Strategies for Multimodal Biometric Verification.

[BibT_eX]

[DOI]