Ahmed Hussen Abdelaziz

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

A Variational Framework for Improving Naturalness in Generative Spoken Language Models.

[BibT_eX]

[DOI]

Li-Wei Chen

Takuya Higuchi

Zakaria Aldeneh

Alexander Rudnicky

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models.

[BibT_eX]

[DOI]

Li-Wei Chen

Takuya Higuchi

He Bai

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels.

[BibT_eX]

[DOI]

Tatiana Likhomanenko

Barry-John Theobald

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[BibT_eX]

[DOI]

CoRR, 2024

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

[BibT_eX]

[DOI]

Barry-John Theobald

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Modality Drop-Out for Multimodal Device Directed Speech Detection Using Verbal and Non-Verbal Features.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features.

[BibT_eX]

[DOI]

CoRR, 2023

Less Is More: A Unified Architecture for Device-Directed Speech Detection with Multiple Invocation Types.

[BibT_eX]

[DOI]

Sachin Kajarekar

Erik Marchi

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models.

[BibT_eX]

[DOI]

Vineet Garg

Ognjen Rudovic

Pranay Dighe

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Audiovisual Speech Synthesis using Tacotron2.

[BibT_eX]

[DOI]

Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

On The Role of Visual Cues in Audiovisual Speech Enhancement.

[BibT_eX]

[DOI]

Zakaria Aldeneh

Proceedings of the IEEE International Conference on Acoustics, 2021

MorphGAN: One-Shot Face Synthesis GAN for Detecting Recognition Bias.

[BibT_eX]

[DOI]

Nataniel Ruiz

Barry-John Theobald

Anurag Ranjan

Nicholas Apostoloff

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Audiovisual Speech Synthesis using Tacotron2.

[BibT_eX]

[DOI]

CoRR, 2020

Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement.

[BibT_eX]

[DOI]

Zakaria Aldeneh

CoRR, 2020

Modality Dropout for Improved Performance-driven Talking Faces.

[BibT_eX]

[DOI]

Proceedings of the ICMI '20: International Conference on Multimodal Interaction, 2020

2019

On Neural Phone Recognition of Mixed-Source ECoG Signals.

[BibT_eX]

[DOI]

CoRR, 2019

Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2019

2018

Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

2017

NTCD-TIMIT: A New Database and Baseline for Noise-Robust Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Turbo Decoders for Audio-Visual Continuous Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improving acoustic modeling using audio-visual speech.

[BibT_eX]

[DOI]

Ahmed Serag Eldin Hussen Abdelaziz

Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

2016

Noise-robust HMM-based pattern recognition using multimodal features and observation uncertainties.

[BibT_eX]

[DOI]

PhD thesis, 2016

General hybrid framework for uncertainty-decoding-based automatic speech recognition systems.

[BibT_eX]

[DOI]

Speech Commun., 2016

Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement.

[BibT_eX]

[DOI]

Hendrik Meutzner

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs.

[BibT_eX]

[DOI]

Mahdie Karbasi

Hendrik Meutzner

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR.

[BibT_eX]

[DOI]

Sebastian Gergen

Robert M. Nickel

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Twin-HMM-based non-intrusive speech intelligibility prediction.

[BibT_eX]

[DOI]

Mahdie Karbasi

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

New Insights into Turbo-Decoding-Based AVSR with Dynamic StreamWeights.

[BibT_eX]

[DOI]

Sebastian Gergen

Proceedings of the 12th ITG Symposium on Speech Communication, 2016

2015

Learning Dynamic Stream Weights For Coupled-HMM-Based Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

Ramón Fernandez Astudillo

IEEE ACM Trans. Audio Speech Lang. Process., 2015

Robust speech processing using observation uncertainty and uncertainty propagation: session and paper overview.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Uncertainty propagation through deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

The Tutorbot Corpus ― A Corpus for Studying Tutoring Behaviour in Multiparty Face-to-Face Spoken Dialogue.

[BibT_eX]

[DOI]

Maria Koutsombogera

Samer Al Moubayed

Bajibabu Bollepalli

Martin Johansson

José David Águas Lopes

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A newem estimationof dynamic stream weights for coupled-HMM-based audio-visual ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Human-robot collaborative tutoring using multiparty multimodal spoken dialogue.

[BibT_eX]

[DOI]

Martin Johansson

Maria Koutsombogera

José David Águas Lopes

Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 2014

2013

Using twin-HMM-based audio-visual speech enhancement as a front-end for robust audio-visual speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Tutoring Robots - Multiparty Multimodal Social Dialogue with an Embodied Tutor.

[BibT_eX]

[DOI]

Samer Al Moubayed

Jonas Beskow

Bajibabu Bollepalli

Martin Johansson

Maria Koutsombogera

José David Águas Lopes

Proceedings of the Innovative and Creative Developments in Multimodal Interaction Systems, 2013

GMM-based significance decoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Twin-HMM-based audio-visual speech enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Decoding of Uncertain Features Using the Posterior Distribution of the Clean Data for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Audio-Visual Speech Recognition for Uncertain Acoustical Observations.

[BibT_eX]

[DOI]