Takuya Higuchi

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

A Variational Framework for Improving Naturalness in Generative Spoken Language Models.

[BibT_eX]

[DOI]

Li-Wei Chen

Zakaria Aldeneh

Alexander Rudnicky

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models.

[BibT_eX]

[DOI]

Li-Wei Chen

He Bai

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Towards Automatic Assessment of Self-Supervised Speech Models using Rank.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels.

[BibT_eX]

[DOI]

Tatiana Likhomanenko

Barry-John Theobald

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[BibT_eX]

[DOI]

CoRR, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

[BibT_eX]

[DOI]

Barry-John Theobald

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Multichannel Voice Trigger Detection Based on Transform-Average-Concatenate.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Does Single-channel Speech Enhancement Improve Keyword Spotting Accuracy? A Case Study.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Improving Voice Trigger Detection with Metric Learning.

[BibT_eX]

[DOI]

Varun Lakshminarasimhan

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Dynamic Curriculum Learning via Data Parameters for Noise Robust Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Multi-Task Learning with Cross Attention for Keyword Spotting.

[BibT_eX]

[DOI]

Anmol Gupta

Chandra Dhir

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Stacked 1D Convolutional Networks for End-to-End Small Footprint Voice Trigger Detection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2018

Nonnegative Matrix Factorization With Basis Clustering Using Cepstral Distance Regularization.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Optimization of Speaker-Aware Multichannel Speech Extraction with ASR Criterion.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Frame-by-Frame Closed-Form Update for Mask-Based Adaptive MVDR Beamforming.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Integrating DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep mixture density network for statistical model-based feature enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Unsupervised utterance-wise beamformer estimation with speech recognition-level criterion.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Online meeting recognition in noisy environments with time-frequency mask based MVDR beamforming.

[BibT_eX]

[DOI]

Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

Learning speaker representation for neural network based multichannel speaker extraction.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Adversarial training for data-driven speech enhancement without parallel corpus.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Multichannel Speech Enhancement Approaches to DNN-Based Far-Field Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Sparseness-based multichannel nonnegative matrix factorization for blind source separation.

[BibT_eX]

[DOI]

Takuya Yoshioka

Tomohiro Nakatani

Proceedings of the IEEE International Workshop on Acoustic Signal Enhancement, 2016

Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion.

[BibT_eX]

[DOI]

Takuya Yoshioka

Tomohiro Nakatani

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Spatial correlation model based observation vector clustering and MVDR beamforming for meeting recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model.

[BibT_eX]

[DOI]

Hirokazu Kameoka

Proceedings of the 23rd European Signal Processing Conference, 2015

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model.

[BibT_eX]

[DOI]

Hirokazu Kameoka

Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2014

A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Underdetermined blind separation and tracking of moving sources based ONDOA-HMM.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Unified approach for underdetermined BSS, VAD, dereverberation and DOA estimation with multichannel factorial HMM.

[BibT_eX]

[DOI]