Herman Kamper

CoRR, March, 2026

ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

Unsupervised lexicon learning from speech is limited by representations rather than clustering.

[BibT_eX]

[DOI]

Danel Adendorff

Simon Malan

CoRR, October, 2025

Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?

[BibT_eX]

[DOI]

Simon Malan

CoRR, July, 2025

Automatically assessing oral narratives of Afrikaans and isiXhosa children.

[BibT_eX]

[DOI]

CoRR, July, 2025

Feature-based analysis of oral narratives from Afrikaans and isiXhosa children.

[BibT_eX]

[DOI]

CoRR, July, 2025

Towards few-shot isolated word reading assessment.

[BibT_eX]

[DOI]

Reuben Smit

Retief Louw

CoRR, July, 2025

Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis.

[BibT_eX]

[DOI]

Hugo Seuté

Jean-Philippe Letendre

Julian Zaïdi

CoRR, July, 2025

Improved visually prompted keyword localisation in real low-resource settings.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Speech Technology and Human-Computer Dialogue, 2025

Spoken Language Modeling with Duration-Penalized Self-Supervised Units.

[BibT_eX]

[DOI]

Nicol Visser

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

The mutual exclusivity bias of bilingual visually grounded speech models.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

LinearVC: Linear Transformations of Self-Supervised Features Through the Lens of Voice Conversion.

[BibT_eX]

[DOI]

Julian Zaïdi

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming.

[BibT_eX]

[DOI]

Simon Malan

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Visually Grounded Few-Shot Word Learning in Low-Resource Settings.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Disentanglement in a GAN for Unconditional Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Visually Grounded Speech Models Have a Mutual Exclusivity Bias.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2024

Leveraging Multilingual Transfer for Unsupervised Semantic Acoustic Word Embeddings.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2024

Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings.

[BibT_eX]

[DOI]

CoRR, 2024

Revisiting speech segmentation and lexicon learning with better features.

[BibT_eX]

[DOI]

CoRR, 2024

Translating speech with just images.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Spoken-Term Discovery using Discrete Speech Units.

[BibT_eX]

[DOI]

Julian Zaïdi

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023

Infant Phonetic Learning as Perceptual Space Learning: A Crosslinguistic Evaluation of Computational Models.

[BibT_eX]

[DOI]

Cogn. Sci., July, 2023

Word Segmentation on Discovered Phone Units With Dynamic Programming and Self-Supervised Scoring.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Rhythm Modeling for Voice Conversion.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2023

Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices.

[BibT_eX]

[DOI]

CoRR, 2023

Semi-Supervised Machine Learning for Livestock Threat Classification Using GPS Data.

[BibT_eX]

[DOI]

Urs J. De Swardt

IEEE Access, 2023

Visually grounded few-shot word acquisition with fewer shots.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification Through Meta-Learning.

[BibT_eX]

[DOI]

Ruan van der Merwe

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili.

[BibT_eX]

[DOI]

Nathanaël Carraz Rakotonirina

Everlyn Asiko Chimoto

Bruce A. Bassett

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Voice Conversion With Just Nearest Neighbors.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Keyword Localisation in Untranscribed Speech Using Visually Grounded Speech Models.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Feature learning for efficient ASR-free keyword spotting in low-resource languages.

[BibT_eX]

[DOI]

Ewald van der Westhuizen

Comput. Speech Lang., 2022

YFACC: A Yorùbá Speech-Image Dataset for Cross-Lingual Keyword Localisation Through Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Towards Visually Prompted Keyword Localisation for Zero-Resource Spoken Languages.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

TransFusion: Transcribing Speech with Multinomial Diffusion.

[BibT_eX]

[DOI]

Kevin Eloff

Proceedings of the Artificial Intelligence Research - Third Southern African Conference, 2022

A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery.

[BibT_eX]

[DOI]

Werner van der Merwe

Johan Adam du Preez

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Voice Conversion Can Improve ASR in Very Low-Resource Settings.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

BINet: A binary inpainting network for deep patch-based image compression.

[BibT_eX]

[DOI]

André Nortje

Willie Brink

Signal Process. Image Commun., 2021

Multilingual and unsupervised subword modeling for zero-resource languages.

[BibT_eX]

[DOI]

Enno Hermann

Comput. Speech Lang., 2021

Towards Learning to Speak and Hear Through Multi-Agent Communication over a Continuous Acoustic Channel.

[BibT_eX]

[DOI]

Kevin Eloff

Arnu Pretorius

Okko Räsänen

CoRR, 2021

A Comparison of Self-Supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Lisa van Staden

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Attention-Based Keyword Localisation in Speech Using Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Direct Multimodal Few-Shot Learning of Speech and Images.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Towards Unsupervised Phone and Word Segmentation Using Self-Supervised Vector-Quantized Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multilingual Transfer of Acoustic Word Embeddings Improves When Training on Languages Related to the Target Zero-Resource Language.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A phonetic model of non-native spoken word processing.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020

Unsupervised Feature Learning for Speech Using Correspondence and Siamese Networks.

[BibT_eX]

[DOI]

Petri-Johan Last

IEEE Signal Process. Lett., 2020

On the expected behaviour of noise regularised deep neural networks as Gaussian processes.

[BibT_eX]

[DOI]

Arnu Pretorius

Steve Kroon

Pattern Recognit. Lett., 2020

If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2020

Towards localisation of keywords in speech using weak supervision.

[BibT_eX]

[DOI]

CoRR, 2020

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Puyuan Peng

Shamsuddeen Hassan Muhammad

CoRR, 2020

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages.

[BibT_eX]

[DOI]

Solomon Oluwole Akinola

Salomon Kabongo

Salomey Osei

Sackey Freshia

Rubungo Andre Niyongabo

Masabata Mokgesi-Selinga

Idris Abdulkabir Dangana

Christopher Onyefuluchi

Chris Emezue

Bonaventure Dossou

Blessing K. Sibanda

Blessing Itoro Bassey

CoRR, 2020

Analyzing autoencoder-based acoustic word embeddings.

[BibT_eX]

[DOI]

CoRR, 2020

Masakhane - Machine Translation For Africa.

[BibT_eX]

[DOI]

Proceedings of the 1st AfricaNLP Workshop Proceedings, 2020

StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence Research, 2020

Unsupervised vs. Transfer Learning for Multimodal One-Shot Matching of Speech and Images.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multilingual Acoustic Word Embedding Models for Processing Zero-resource Languages.

[BibT_eX]

[DOI]

Shamsuddeen Hassan Muhammad

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Cross-Lingual Topic Prediction For Speech Using Translations.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages.

[BibT_eX]

[DOI]

Solomon Oluwole Akinola

Salomon Kabongo Kabenamualu

Salomey Osei

Freshia Sackey

Rubungo Andre Niyongabo

Masabata Mokgesi-Selinga

Idris Abdulkabir Dangana

Christopher Onyefuluchi

Chris Chinenye Emezue

Bonaventure F. P. Dossou

Blessing K. Sibanda

Blessing Itoro Bassey

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Evaluating computational models of infant phonetic learning across languages.

[BibT_eX]

[DOI]

Proceedings of the 42th Annual Meeting of the Cognitive Science Society, 2020

2019

Deep motion estimation for parallel inter-frame prediction in video compression.

[BibT_eX]

[DOI]

André Nortje

CoRR, 2019

Classifying topics in speech when all you have is crummy translations.

[BibT_eX]

[DOI]

CoRR, 2019

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Feature Exploration for Almost Zero-Resource ASR-Free Keyword Spotting Using a Multilingual Bottleneck Extractor and Correspondence Autoencoders.

[BibT_eX]

[DOI]

Raghav Menon

Ewald van der Westhuizen

John A. Quinn

Thomas Niesler

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Unsupervised Acoustic Unit Discovery for Speech Synthesis Using Discrete Latent-Variable Neural Networks.

[BibT_eX]

[DOI]

Ewald van der Westhuizen

Lisa van Staden

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Semantic Query-by-example Speech Search Using Visual Grounding.

[BibT_eX]

[DOI]

Aristotelis Anastassiou

Proceedings of the IEEE International Conference on Acoustics, 2019

Truly Unsupervised Acoustic Word Embeddings Using Weak Top-down Constraints in Encoder-decoder Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Multimodal One-shot Learning of Speech and Images.

[BibT_eX]

[DOI]

Ryan Eloff

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Almost Zero-Resource ASR-free Keyword Spotting using Multilingual Bottleneck Features and Correspondence Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2018

ASR-Free CNN-DTW Keyword Spotting Using Multilingual Bottleneck Features for Almost Zero-Resource Languages.

[BibT_eX]

[DOI]

Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

Visually Grounded Cross-Lingual Keyword Spotting in Speech.

[BibT_eX]

[DOI]

Michael Roth

Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

Critical initialisation for deep signal propagation in noisy rectifier neural networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Fast ASR-free and Almost Zero-resource Keyword Spotting Using DTW and CNNs for Humanitarian Monitoring.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Low-Resource Speech-to-Text Translation.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Learning Dynamics of Linear Denoising Autoencoders.

[BibT_eX]

[DOI]

Arnu Pretorius

Steve Kroon

Proceedings of the 35th International Conference on Machine Learning, 2018

Phoneme Based Embedded Segmental K-Means for Unsupervised Term Discovery.

[BibT_eX]

[DOI]

Saurabchiand Bhati

K. Sri Rama Murty

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Semantic Speech Retrieval With a Visually Grounded Model of Untranscribed Speech.

[BibT_eX]

[DOI]

Gregory Shakhnarovich

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

2017

Unsupervised neural and Bayesian models for zero-resource speech processing.

[BibT_eX]

[DOI]

PhD thesis, 2017

A segmental framework for fully-unsupervised large-vocabulary speech recognition.

[BibT_eX]

[DOI]

Aren Jansen

Comput. Speech Lang., 2017

Semantic keyword spotting by learning from images and speech.

[BibT_eX]

[DOI]

Gregory Shakhnarovich

CoRR, 2017

Unsupervised neural and Bayesian models for zero-resource speech processing.

[BibT_eX]

[DOI]

CoRR, 2017

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Visually Grounded Learning of Keyword Prediction from Untranscribed Speech.

[BibT_eX]

[DOI]

Shane Settle

Gregory Shakhnarovich

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Weakly supervised spoken term discovery using cross-lingual side information.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Towards speech-to-text translation without speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

An embedded segmental K-means model for unsupervised segmentation and clustering of speech.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Aren Jansen

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Deep convolutional acoustic word embeddings using word-pair side information.

[BibT_eX]

[DOI]

Weiran Wang

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model.

[BibT_eX]

[DOI]

Aren Jansen

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Unsupervised neural network based feature extraction using weak top-down constraints.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2014

Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

2012

Multi-accent acoustic modelling of South African English.

[BibT_eX]

[DOI]

Félicien Jeje Muamba Mukanya

Thomas Niesler

Speech Commun., 2012

Resource development and experiments in automatic south african broadcast news transcription.

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

2011

Multi-Accent Speech Recognition of Afrikaans, Black and White Varieties of South African English.

[BibT_eX]

[DOI]