Jia Qi Yip

CoRR, September, 2025

Bona fide Cross Testing Reveals Weak Spot in Audio Deepfake Detection Systems.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Speechless: Speech Instruction Training Without Speech for Low Resource Languages.

[BibT_eX]

[DOI]

Warren Keng Hoong Low

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Robust Audio Deepfake Detection using Ensemble Confidence Calibration.

[BibT_eX]

[DOI]

Duc-Tuan Truong

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Extending Whisper for Emotion Prediction Using Word-level Pseudo Labels.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speech Enhancement Using Continuous Embeddings of Neural Audio Codec.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.

[BibT_eX]

[DOI]

Fabian Ritter Gutierrez

CoRR, 2024

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions.

[BibT_eX]

[DOI]

CoRR, 2024

Improved Alignment for Score Combination of RNN-T and CTC Decoder for Online Decoding.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 27th International Conference, 2024

Continual Learning With Embedding Layer Surgery and Task-Wise Beam Search Using Whisper.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs For Audio, Music, and Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Towards Audio Codec-based Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

SPGM: Prioritizing Local Features for Enhanced Speech Separation Performance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Low Resource Language Adaptation using Two-stage Regularization for Multilingual ASR.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Asian Language Processing, 2024

Low-resource Language Adaptation with Ensemble of PEFT Approaches.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

Speech Separation using Neural Audio Codecs with Embedding Loss.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023

Codec Data Augmentation for Time-domain Heart Sound Classification.

[BibT_eX]

[DOI]

Ansh Mishra