Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Lessons Learned from the URGENT 2024 Speech Enhancement Challenge.

[BibT_eX]

[DOI]

Wangyou Zhang

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Interspeech 2025 URGENT Speech Enhancement Challenge.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

The Text-to-speech in the Wild (TITW) Database.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Ranking and Selection of Bias Words for Contextual Bias Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

VERSA-v2: A Modular and Scalable Toolkit for Speech and Audio Evaluation with Expanded Metrics, Visualization, and LLM Integration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Less is More: Data Curation Matters in Scaling Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech.

[BibT_eX]

[DOI]

Dataset, December, 2024

Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition [Special Issue On Model-Based and Data-Driven Audio Signal Processing].

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., November, 2024

Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

Text-To-Speech Synthesis In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2024

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition.

[BibT_eX]

[DOI]

CoRR, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[BibT_eX]

[DOI]

Ahmed Hussen Abdelaziz

Shinji Watanabe

CoRR, 2024

Improving Design of Input Condition Invariant Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2024

Insights from Hyperparameter Scaling of Online Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[BibT_eX]

[DOI]

Ahmed Hussen Abdelaziz

Shinji Watanabe

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Improving Design of Input Condition Invariant Speech Enhancement.

[BibT_eX]

[DOI]

Wangyou Zhang

Jee-weon Jung

Yanmin Qian

Proceedings of the IEEE International Conference on Acoustics, 2024

Generation-Based Target Speech Extraction with Speech Discretization and Vocoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Robust Speech Representation Learning for Thousands of Languages.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.

[BibT_eX]

[DOI]

J. Open Source Softw., November, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310).

[BibT_eX]

[DOI]

Dataset, October, 2023

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition.

[BibT_eX]

[DOI]

Wangyou Zhang

Yanmin Qian

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Overlap Aware Continuous Speech Separation without Permutation Invariant Training.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing.

[BibT_eX]

[DOI]

Wangyou Zhang

Lei Yang

Yanmin Qian

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Toward Universal Speech Enhancement For Diverse Input Conditions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

End-to-End Multi-Speaker ASR with Independent Vector Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Exploring Effective Data Utilization for Low-Resource Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Text Adaptive Detection for Customizable Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.

[BibT_eX]

[DOI]

Chenda Li

Jing Shi

Wangyou Zhang

Aswin Shanmugam Subramanian

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Improving End-to-End Single-Channel Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.

[BibT_eX]

[DOI]

Aswin Shanmugam Subramanian

Wangyou Zhang

CoRR, 2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation.

[BibT_eX]

[DOI]

CoRR, 2020

End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.

[BibT_eX]

[DOI]

Wangyou Zhang

Aswin Shanmugam Subramanian

Xuankai Chang

Shinji Watanabe

Yanmin Qian

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

Wangyou Zhang

Yanmin Qian

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-To-End Multi-Speaker Speech Recognition With Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking.

[BibT_eX]

[DOI]

Wangyou Zhang

Ying Zhou

Yanmin Qian

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System.

[BibT_eX]

[DOI]

Wangyou Zhang

Xuankai Chang

Yanmin Qian

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Comparative Study on Transformer vs RNN in Speech Applications.

[BibT_eX]

[DOI]

Nelson Enrique Yalta Soplin

Ryuichi Yamamoto

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Wangyou Zhang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...