Szu-Wei Fu

Dyah A. M. G. Wisnu

Hsin-Min Wang

CoRR, April, 2026

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation.

[BibT_eX]

[DOI]

CoRR, March, 2026

Rethinking Training Targets, Architectures and Data Quality for Universal Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, March, 2026

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception.

[BibT_eX]

[DOI]

CoRR, January, 2026

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations.

[BibT_eX]

[DOI]

CoRR, October, 2025

Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations.

[BibT_eX]

[DOI]

CoRR, August, 2025

Leveraging Mamba with Full-Face Vision for Audio-Visual Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, August, 2025

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment.

[BibT_eX]

[DOI]

CoRR, July, 2025

Linguistic Knowledge Transfer Learning for Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, March, 2025

Foundation Models for Speech Enhancement Leveraging Consistency Constraints and Contrast Stretching.

[BibT_eX]

[DOI]

Muhammad Salman Khan

Valerio Mario Salerno

IEEE Access, 2025

VoiceNoNG: Robust High-Quality Speech Editing Model without Hallucinations.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Universal Speech Enhancement with Regression and Generative Mamba.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

HighRateMOS: Sampling-Rate Aware Modeling for Speech Quality Assessment.

[BibT_eX]

[DOI]

Wenze Ren

Yi-Cheng Lin

Wen-Chin Huang

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

2024

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

The Voicemos Challenge 2024: Beyond Speech Quality Prediction.

[BibT_eX]

[DOI]

Wen-Chin Huang

Erica Cooper

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

An Investigation of Incorporating Mamba For Speech Enhancement.

[BibT_eX]

[DOI]

Rong Chao

Wen-Huang Cheng

Moreno La Quatra

Chao-Han Huck Yang

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

RankUp: Boosting Semi-Supervised Regression with an Auxiliary Ranking Classifier.

[BibT_eX]

[DOI]

Pin-Yen Huang

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-Based Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Workshop on Multimedia Signal Processing, 2024

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Study On Incorporating Whisper For Robust Speech Assessment.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model.

[BibT_eX]

[DOI]

CoRR, 2023

QuAVF: Quality-aware Audio-Visual Fusion for Ego4D Talking to Me Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

Real-Time Speech Interruption Analysis: from Cloud to Client Deployment.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Study on the Correlation Between Objective Evaluations and Subjective Speech Quality and Intelligibility.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application.

[BibT_eX]

[DOI]

Yu-Wen Chen

Kuo-Hsuan Hung

You-Jin Li

Alexander Chao-Fu Kang

IEEE Access, 2022

Improving Meeting Inclusiveness using Speech Interruption Analysis.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model.

[BibT_eX]

[DOI]

Ryandhimas Edo Zezario

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

OSSEM: one-shot speaker adaptive speech enhancement using meta learning.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Boosting Self-Supervised Embeddings for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Perceptual Contrast Stretching on Target Feature for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

A Study of Joint Effect on Denoising Techniques and Visual Cues to Improve Speech Intelligibility in Cochlear Implant Simulation.

[BibT_eX]

[DOI]

IEEE Trans. Cogn. Dev. Syst., 2021

SpeechBrain: A General-Purpose Speech Toolkit.

[BibT_eX]

[DOI]

CoRR, 2021

Improving Perceptual Quality by Phone-Fortified Perceptual Loss Using Wasserstein Distance for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020

Learning With Learned Loss Function: Speech Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality.

[BibT_eX]

[DOI]

Chien-Feng Liao

IEEE Signal Process. Lett., 2020

Improving Perceptual Quality by Phone-Fortified Perceptual Loss for Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2020

Boosting Objective Scores of Speech Enhancement Model through MetricGAN Post-Processing.

[BibT_eX]

[DOI]

CoRR, 2020

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric Learning.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019

Increasing Compactness of Deep Learning Based Speech Enhancement Models With Parameter Pruning and Quantization Techniques.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2019

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2019

Seeing Voices in Noise: A Study of Audiovisual-Enhanced Vocoded Speech Intelligibility in Cochlear Implant Simulation.

[BibT_eX]

[DOI]

CoRR, 2019

Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement.

[BibT_eX]

[DOI]

Natalie Yu-Hsien Wang

Hsiao-Lan Sharon Wang

CoRR, 2019

Multichannel Speech Enhancement by Raw Waveform-mapping using Fully Convolutional Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Noise Reduction in ECG Signals Using Fully Convolutional Denoising Autoencoders.

[BibT_eX]

[DOI]

IEEE Access, 2019

Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

IA-NET: Acceleration and Compression of Speech Enhancement Using Integer-Adder Deep Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

2018

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

A Study on Speech Enhancement Using Exponent-Only Floating Point Quantized Neural Network (EOFP-QNN).

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery.

[BibT_eX]

[DOI]

IEEE Trans. Biomed. Eng., 2017

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2017

Multi-Metrics Learning for Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2017

Complex spectrogram enhancement by convolutional neural network with multi-metrics learning.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Raw waveform-based speech enhancement by fully convolutional networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Collagen image compression using the JPEG-based predictive lossless coding scheme.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Maximum Entropy Learning with Deep Belief Networks.

[BibT_eX]

[DOI]

Entropy, 2016

SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement.

[BibT_eX]

[DOI]

Xugang Lu

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Horizontal adaptive disparity estimation scheme for stereoscopic images.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014

Compression for the feature points with binary descriptors.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Digital Signal Processing, 2014

Image deblurring using a pyramid-based Richardson-Lucy algorithm.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Digital Signal Processing, 2014

A novel compression algorithm for IMFs of Hilbert-Huang transform.

[BibT_eX]

[DOI]

Ying-Jou Chen

Jian-Jiun Ding