Themos Stafylakis

CoRR, May, 2026

LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance.

[BibT_eX]

[DOI]

CoRR, March, 2026

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Alpha Divergence Losses for Biometric Verification.

[BibT_eX]

[DOI]

CoRR, November, 2025

Automatic Speech Recognition for Greek Medical Dictation.

[BibT_eX]

[DOI]

Vardis Georgilas

CoRR, September, 2025

Building Open-Retrieval Conversational Question Answering Systems by Generating Synthetic Data and Decontextualizing User Questions.

[BibT_eX]

[DOI]

Elisavet Palogiannidi

Ion Androutsopoulos

Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2025

Synthetic Speech Source Tracing using Metric Learning.

[BibT_eX]

[DOI]

Dimitrios Koutsianos

Stavros Zacharopoulos

Yannis Panagakis

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Analysis of ABC Frontend Audio Systems for the NIST-SRE24.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

State-of-the-art Embeddings with Video-free Segmentation of the Source VoxCeleb Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

BUT Systems and Analyses for the ASVspoof 5 Challenge.

[BibT_eX]

[DOI]

CoRR, 2024

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Challenging margin-based speaker embedding extractors by using the variational information bottleneck.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Comparing Data Augmentation Methods for End-to-End Task-Oriented Dialog Systems.

[BibT_eX]

[DOI]

Christos Vlachos

Ion Androutsopoulos

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

KAN-AV dataset for audio-visual face and speech analysis in the wild.

[BibT_eX]

[DOI]

Triantafyllos Kefalas

Image Vis. Comput., December, 2023

Improving Speaker Verification with Self-Pretrained Transformer Models.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Description and Analysis of ABC Submission to NIST LRE 2022.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

A Simple Baseline for Knowledge-Based Visual Question Answering.

[BibT_eX]

[DOI]

Alexandros Xenos

Ioannis Patras

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022

Extracting Speaker and Emotion Information from Self-Supervised Speech Models via Channel-Wise Correlations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

An Attention-Based Backend Allowing Efficient Fine-Tuning of Transformer Models for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Analyzing Speaker Verification Embedding Extractors and Back-Ends Under Language and Channel Mismatch.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Development of ABC Systems for the 2021 Edition of NIST Speaker Recognition Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Training speaker embedding extractors using multi-speaker audio with unknown speaker boundaries.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Speaker Embeddings by Modeling Channel-Wise Correlations.

[BibT_eX]

[DOI]

Johan Rohdin

Lukás Burget

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020

Probabilistic Embeddings for Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

End-to-End Architectures for ASR-Free Spoken Language Understanding.

[BibT_eX]

[DOI]

Elisavet Palogiannidi

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Seeing wake words: Audio-visual Keyword Spotting.

[BibT_eX]

[DOI]

Liliane Momeni

Triantafyllos Afouras

Samuel Albanie

Andrew Zisserman

Proceedings of the 31st British Machine Vision Conference 2020, 2020

2019

Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge.

[BibT_eX]

[DOI]

Hossein Zeinali

Georgia Athanasopoulou

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Self-Supervised Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Privacy-Preserving Speaker Recognition with Cohort Score Normalisation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

How to Improve Your Speaker Embeddings Extractor in Generic Toolkits.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Speaker Verification Using End-to-end Adversarial Language Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs.

[BibT_eX]

[DOI]

Muhammad Haris Khan

Comput. Vis. Image Underst., 2018

Audio-Visual Speech Recognition with a Hybrid CTC/Attention Architecture.

[BibT_eX]

[DOI]

Stavros Petridis

Pingchuan Ma

Maja Pantic

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

DeepMine Speech Processing Database: Text-Dependent and Independent Speaker Verification and Speech Recognition in Persian and English.

[BibT_eX]

[DOI]

Hossein Zeinali

Hossein Sameti

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Deep Word Embeddings for Visual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-End Audiovisual Speech Recognition.

[BibT_eX]

[DOI]

Maja Pantic

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Zero-Shot Keyword Spotting for Visual Speech Recognition In-the-wild.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

2017

Combining Residual Networks with LSTMs for Lipreading.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016.

[BibT_eX]

[DOI]

Achintya Kumar Sarkar

Fahimeh Bahmaninezhad

Sergey Isadskiy

Christian Rathgeb

Christoph Busch

Dennis Alexander Lehmann Thomsen

Pierre-Michel Bousquet

Jean-François Bonastre

Eliathamby Ambikairajah

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Speaker and Channel Factors in Text-Dependent Speaker Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Text-Dependent Speaker Recognition With Random Digit Strings.

[BibT_eX]

[DOI]

Md. Jahangir Alam

Patrick Kenny

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Compensation for phonetic nuisance variability in speaker recognition using DNNs.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Uncertainty Modeling Without Subspace Methods For Text-Dependent Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Deep Neural Network based Text-Dependent Speaker Verification : Preliminary Results.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Spoofing Detection on the ASVspoof2015 Challenge Corpus Employing Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Towards PLDA-RBM based speaker recognition in mobile environment: Designing stacked/deep PLDA-RBM systems.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

JFA for speaker recognition with random digit strings.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The reddots data collection for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

An i-vector backend for speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Combining amplitude and phase-based features for speaker verification with short duration utterances.

[BibT_eX]

[DOI]

Md. Jahangir Alam

Patrick Kenny

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

JFA modeling with left-to-right structure and a new backend for text-dependent speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2014

Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Joint Factor Analysis for Text-Dependent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

In-domain versus out-of-domain training for text-dependent JFA.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

JFA-based front ends for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Unscented transform for ivector-based noisy speaker recognition.

[BibT_eX]

[DOI]

David Martínez González

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Text-dependent speaker recognition using PLDA with uncertainty propagation.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Compensation for inter-frame correlations in speaker diarization and recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Efficient iterative mean shift based cosine dissimilarity for multi-recording speaker clustering.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

PLDA for speaker verification with utterances of arbitrary duration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Preliminary investigation of Boltzmann machine classifiers for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

Mean shift algorithm for exponential families with applications to speaker clustering.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

A mean shift algorithm for manifolds of exponential families.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Information Science, 2012

PLDA using Gaussian Restricted Boltzmann Machines with application to Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Music tempo estimation and beat tracking by applying source separation and metrical relations.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Developing a Scoring Algorithm for Automatic Pronunciation Assessment of Modern Greek.

[BibT_eX]

[DOI]

Frieda Charalabopoulou

George K. Mikros

J. Quant. Linguistics, 2011

Enhancing Handwritten Word Segmentation by Employing Local Spatial Features.

[BibT_eX]

[DOI]

Fotini Simistira

Proceedings of the 2011 International Conference on Document Analysis and Recognition, 2011

Closed-form expressions vs. BIC: A comparison for speaker clustering.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Handwritten document image segmentation into text lines and words.

[BibT_eX]

[DOI]

Pattern Recognit., 2010

The Segmental Bayesian Information Criterion and Its Applications to Speaker Diarization.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2010

Speaker clustering via the mean shift algorithm.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Improvements to the equal-parameter BIC for speaker diarization.

[BibT_eX]

[DOI]

Xavier Anguera

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A new penalty term for the BIC with respect to speaker diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Redefining the Bayesian information criterion for speaker diarisation.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

2008

Robust text-line and word segmentation for handwritten documents images.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

PANOPTIS: A System for Intelligent Monitoring of the Hellenic Broadcast Sector.

[BibT_eX]

[DOI]

Iason Demiros

Vassilios Antonopoulos

Spyros Raptis

Fotini Simistira

Proceedings of the 19th International Workshop on Database and Expert Systems Applications (DEXA 2008), 2008

2007

A Parametric Spectral-Based Method for Verification of Text in Videos.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), 2007

Efficient combination of parametric spaces, models and metrics for speaker diarization<sup>1</sup>.

[BibT_eX]

[DOI]