Herman Kamper

Orcid: 0000-0003-2980-3475

According to our database1, Herman Kamper authored at least 88 papers between 2011 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Disentanglement in a GAN for Unconditional Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Leveraging Multilingual Transfer for Unsupervised Semantic Acoustic Word Embeddings.
IEEE Signal Process. Lett., 2024

Visually Grounded Speech Models have a Mutual Exclusivity Bias.
CoRR, 2024

Revisiting speech segmentation and lexicon learning with better features.
CoRR, 2024

2023
Infant Phonetic Learning as Perceptual Space Learning: A Crosslinguistic Evaluation of Computational Models.
Cogn. Sci., July, 2023

Word Segmentation on Discovered Phone Units With Dynamic Programming and Self-Supervised Scoring.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Rhythm Modeling for Voice Conversion.
IEEE Signal Process. Lett., 2023

Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices.
CoRR, 2023

Visually grounded few-shot word learning in low-resource settings.
CoRR, 2023

Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili.
CoRR, 2023

Voice Conversion With Just Nearest Neighbors.
CoRR, 2023

Visually grounded few-shot word acquisition with fewer shots.
CoRR, 2023

Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification Through Meta-Learning.
CoRR, 2023

Semi-Supervised Machine Learning for Livestock Threat Classification Using GPS Data.
IEEE Access, 2023

2022
Keyword Localisation in Untranscribed Speech Using Visually Grounded Speech Models.
IEEE J. Sel. Top. Signal Process., 2022

Feature learning for efficient ASR-free keyword spotting in low-resource languages.
Comput. Speech Lang., 2022

YFACC: A Yorùbá Speech-Image Dataset for Cross-Lingual Keyword Localisation Through Visual Grounding.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Towards Visually Prompted Keyword Localisation for Zero-Resource Spoken Languages.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

TransFusion: Transcribing Speech with Multinomial Diffusion.
Proceedings of the Artificial Intelligence Research - Third Southern African Conference, 2022

A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery.
Proceedings of the Interspeech 2022, 2022

Voice Conversion Can Improve ASR in Very Low-Resource Settings.
Proceedings of the Interspeech 2022, 2022

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

BINet: A binary inpainting network for deep patch-based image compression.
Signal Process. Image Commun., 2021

Multilingual and unsupervised subword modeling for zero-resource languages.
Comput. Speech Lang., 2021

Towards Learning to Speak and Hear Through Multi-Agent Communication over a Continuous Acoustic Channel.
CoRR, 2021

Mava: a research framework for distributed multi-agent reinforcement learning.
CoRR, 2021

A Comparison of Self-Supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Attention-Based Keyword Localisation in Speech Using Visual Grounding.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Direct Multimodal Few-Shot Learning of Speech and Images.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Towards Unsupervised Phone and Word Segmentation Using Self-Supervised Vector-Quantized Neural Networks.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Multilingual Transfer of Acoustic Word Embeddings Improves When Training on Languages Related to the Target Zero-Resource Language.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

A phonetic model of non-native spoken word processing.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020
Unsupervised Feature Learning for Speech Using Correspondence and Siamese Networks.
IEEE Signal Process. Lett., 2020

On the expected behaviour of noise regularised deep neural networks as Gaussian processes.
Pattern Recognit. Lett., 2020

If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks.
Pattern Recognit. Lett., 2020

Towards localisation of keywords in speech using weak supervision.
CoRR, 2020

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings.
CoRR, 2020

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages.
CoRR, 2020

Analyzing autoencoder-based acoustic word embeddings.
CoRR, 2020


StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts.
Proceedings of the Artificial Intelligence Research, 2020

Unsupervised vs. Transfer Learning for Multimodal One-Shot Matching of Speech and Images.
Proceedings of the Interspeech 2020, 2020

Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge.
Proceedings of the Interspeech 2020, 2020

Multilingual Acoustic Word Embedding Models for Processing Zero-resource Languages.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Cross-Lingual Topic Prediction For Speech Using Translations.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020


Evaluating computational models of infant phonetic learning across languages.
Proceedings of the 42th Annual Meeting of the Cognitive Science Society, 2020

2019
Semantic Speech Retrieval With a Visually Grounded Model of Untranscribed Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Deep motion estimation for parallel inter-frame prediction in video compression.
CoRR, 2019

Classifying topics in speech when all you have is crummy translations.
CoRR, 2019

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval.
Proceedings of the Interspeech 2019, 2019

Feature Exploration for Almost Zero-Resource ASR-Free Keyword Spotting Using a Multilingual Bottleneck Extractor and Correspondence Autoencoders.
Proceedings of the Interspeech 2019, 2019

Unsupervised Acoustic Unit Discovery for Speech Synthesis Using Discrete Latent-Variable Neural Networks.
Proceedings of the Interspeech 2019, 2019

Semantic Query-by-example Speech Search Using Visual Grounding.
Proceedings of the IEEE International Conference on Acoustics, 2019

Truly Unsupervised Acoustic Word Embeddings Using Weak Top-down Constraints in Encoder-decoder Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Multimodal One-shot Learning of Speech and Images.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Almost Zero-Resource ASR-free Keyword Spotting using Multilingual Bottleneck Features and Correspondence Autoencoders.
CoRR, 2018

ASR-Free CNN-DTW Keyword Spotting Using Multilingual Bottleneck Features for Almost Zero-Resource Languages.
Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

Visually Grounded Cross-Lingual Keyword Spotting in Speech.
Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

Critical initialisation for deep signal propagation in noisy rectifier neural networks.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Fast ASR-free and Almost Zero-resource Keyword Spotting Using DTW and CNNs for Humanitarian Monitoring.
Proceedings of the Interspeech 2018, 2018

Low-Resource Speech-to-Text Translation.
Proceedings of the Interspeech 2018, 2018

Learning Dynamics of Linear Denoising Autoencoders.
Proceedings of the 35th International Conference on Machine Learning, 2018

Phoneme Based Embedded Segmental K-Means for Unsupervised Term Discovery.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Unsupervised neural and Bayesian models for zero-resource speech processing.
PhD thesis, 2017

A segmental framework for fully-unsupervised large-vocabulary speech recognition.
Comput. Speech Lang., 2017

Semantic keyword spotting by learning from images and speech.
CoRR, 2017

Unsupervised neural and Bayesian models for zero-resource speech processing.
CoRR, 2017

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings.
Proceedings of the Interspeech 2017, 2017

Visually Grounded Learning of Keyword Prediction from Untranscribed Speech.
Proceedings of the Interspeech 2017, 2017

Weakly supervised spoken term discovery using cross-lingual side information.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Towards speech-to-text translation without speech recognition.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

An embedded segmental K-means model for unsupervised segmentation and clustering of speech.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Deep convolutional acoustic word embeddings using word-pair side information.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge.
Proceedings of the INTERSPEECH 2015, 2015

Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model.
Proceedings of the INTERSPEECH 2015, 2015

Unsupervised neural network based feature extraction using weak top-down constraints.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system.
Comput. Speech Lang., 2014

Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

2012
Multi-accent acoustic modelling of South African English.
Speech Commun., 2012

Resource development and experiments in automatic south african broadcast news transcription.
Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

2011
Multi-Accent Speech Recognition of Afrikaans, Black and White Varieties of South African English.
Proceedings of the INTERSPEECH 2011, 2011


  Loading...