Xiao-Lei Zhang

Xuelong Li

Neural Networks, 2025

A deep clustering framework for underwater image recognition.

[BibT_eX]

[DOI]

Lei Zhao

Kunde Yang

Digit. Signal Process., 2025

Co-Attention Based Multi-Channel TF-GridNet for Speech Separation with Ad-Hoc Microphone Arrays.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Interpretable Spectrum Transformation Attacks to Speaker Recognition Systems.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Learning Multi-Dimensional Speaker Localization: Axis Partitioning, Unbiased Label Distribution, and Data Augmentation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Transformer-Based End-to-End Speech Translation With Rotary Position Embedding.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2024

Graph Attention Based Multi-Channel U-Net for Speech Dereverberation With Ad-Hoc Microphone Arrays.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Exploiting A Quantum Multiple Kernel Learning Approach For Low-Resource Spoken Command Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Symmetric Saliency-Based Adversarial Attack to Speaker Identification.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2023

Deep NMF topic modeling.

[BibT_eX]

[DOI]

Neurocomputing, 2023

Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays.

[BibT_eX]

[DOI]

Yijiang Chen

Chengdong Liang

CoRR, 2023

Interpretable Spectrum Transformation Attacks to Speaker Recognition.

[BibT_eX]

[DOI]

Jiadi Yao

Hong Luo

CoRR, 2023

Branch-ECAPA-TDNN: A Parallel Branch Architecture to Capture Local and Global Features for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Wekws: A Production First Small-Footprint End-to-End Keyword Spotting Toolkit.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Optimizing Quantum Federated Learning Based on Federated Quantum Natural Gradient Descent.

[BibT_eX]

[DOI]

Jun Qi

Javier Tejedor

Proceedings of the IEEE International Conference on Acoustics, 2023

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Deep ad-hoc beamforming based on speaker extraction for target-dependent speech separation.

[BibT_eX]

[DOI]

Shanzheng Guan

Speech Commun., 2022

AUC optimization for deep learning-based voice activity detection.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2022

Deep Learning Based Two-dimensional Speaker Localization With Large Ad-hoc Microphone Arrays.

[BibT_eX]

[DOI]

CoRR, 2022

End-to-end Two-dimensional Sound Source Localization With Ad-hoc Microphone Arrays.

[BibT_eX]

[DOI]

Yijun Gong

Shupei Liu

CoRR, 2022

Multi-modal emotion recognition using EEG and speech signals.

[BibT_eX]

[DOI]

Comput. Biol. Medicine, 2022

Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Minimum-Volume Multichannel Nonnegative Matrix Factorization for Blind Audio Source Separation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Speaker recognition based on deep learning: An overview.

[BibT_eX]

[DOI]

Neural Networks, 2021

Deep ad-hoc beamforming.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2021

Frame-level multi-channel speaker verification with large-scale ad-hoc microphone arrays.

[BibT_eX]

[DOI]

Chengdong Liang

Jiadi Yao

CoRR, 2021

AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data.

[BibT_eX]

[DOI]

CoRR, 2021

Unsupervised Ensemble Selection for Multilayer Bootstrap Networks.

[BibT_eX]

[DOI]

CoRR, 2021

Minimum-volume Multichannel Nonnegative matrix factorization for blind source separation.

[BibT_eX]

[DOI]

Shanzheng Guan

CoRR, 2021

Scaling Sparsemax Based Channel Selection for Speech Recognition with ad-hoc Microphone Arrays.

[BibT_eX]

[DOI]

Junqi Chen

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Transformer-Based End-to-End Speech Recognition with Local Dense Synthesizer Attention.

[BibT_eX]

[DOI]

Shengqiang Li

Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Enhancement Aided End-To-End Multi-Task Learning for Voice Activity Detection.

[BibT_eX]

[DOI]

Xu Tan

Proceedings of the IEEE International Conference on Acoustics, 2021

A comparison of handcrafted, parameterized, and learnable features for speech separation.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Minimum-volume regularized ILRMA for blind audio source separation.

[BibT_eX]

[DOI]

Shanzheng Guan

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Attention-based multi-channel speaker verification with ad-hoc microphone arrays.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Efficient conformer-based speech recognition with linear attention.

[BibT_eX]

[DOI]

Shengqiang Li

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Conformer-based End-to-end Speech Recognition With Rotary Position Embedding.

[BibT_eX]

[DOI]

Shengqiang Li

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Libri-adhoc40: A dataset collected from synchronized ad-hoc microphone arrays.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Speaker Verification by Partial AUC Optimization With Mahalanobis Distance Metric Learning.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Cosine metric learning based speaker verification.

[BibT_eX]

[DOI]

Speech Commun., 2020

Partial AUC Metric Learning Based Speaker Verification Back-End.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Deep Topic Modeling by Multilayer Bootstrap Network and Lasso.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Multi-channel Speech Separation Using Deep Embedding With Multilayer Bootstrap Networks.

[BibT_eX]

[DOI]

Zhonghua Fu

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019

Phase-Aware Speech Enhancement Based on Deep Neural Networks.

[BibT_eX]

[DOI]

Naijun Zheng

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Multi-channel Speech Separation Using Deep Embedding Model with Multilayer Bootstrap Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Robust Sparse Multichannel Active Noise Control.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

AUC Optimization for Deep Learning Based Voice Activity Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Boosting Spatial Information for Deep Learning Based Multichannel Speaker-Independent Speech Separation In Reverberant Environments.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Hybrid Constant-Q Transform Based CNN Ensemble for Acoustic Scene Classification.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Multilayer bootstrap networks.

[BibT_eX]

[DOI]

Neural Networks, 2018

Linear Regression for Speaker Verification.

[BibT_eX]

[DOI]

CoRR, 2018

Cosine Metric Learning for Speaker Verification in the I-vector Space.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

An Investigation of Speaker Clustering Algorithms in Adverse Acoustic Environments.

[BibT_eX]

[DOI]

Meng-Zhen Li

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

Learning the kernel matrix by resampling.

[BibT_eX]

[DOI]

CoRR, 2017

Speech separation by cost-sensitive deep learning.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

A Deep Ensemble Learning Method for Monaural Speech Separation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker Clustering.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Heuristic Ternary Error-Correcting Output Codes Via Weight Optimization and Layered Clustering-Based Approach.

[BibT_eX]

[DOI]

IEEE Trans. Cybern., 2015

Convex Discriminative Multitask Clustering.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2015

Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker Recognition.

[BibT_eX]

[DOI]

CoRR, 2015

Multilayer bootstrap network for unsupervised speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2015

Unsupervised model compression for multilayer bootstrap networks.

[BibT_eX]

[DOI]

CoRR, 2015

Multi-resolution stacking for speech separation based on boosted DNN.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Unsupervised domain adaptation for deep neural network based voice activity detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Nonlinear Dimensionality Reduction of Data by Deep Distributed Random Samplings.

[BibT_eX]

[DOI]

Proceedings of the Sixth Asian Conference on Machine Learning, 2014

2013

Deep Belief Networks Based Voice Activity Detection.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Heuristic Ternary Error-Correcting Output Codes Via Weight Optimization and Layered Clustering-Based Approach

[BibT_eX]

[DOI]

CoRR, 2013

Convex Discriminative Multitask Clustering

[BibT_eX]

[DOI]

CoRR, 2013

Transfer Learning for Voice Activity Detection: A Denoising Deep Neural Network Perspective

[BibT_eX]

[DOI]

CoRR, 2013

Learning Deep Representations By Distributed Random Samplings.

[BibT_eX]

[DOI]

CoRR, 2013

Weight optimization and layered clustering-based ECOC.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Denoising deep neural networks based voice activity detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Linearithmic Time Sparse and Convex Maximum Margin Clustering.

[BibT_eX]

[DOI]

IEEE Trans. Syst. Man Cybern. Part B, 2012

Secure Ranking over Encrypted Documents.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2012

Perceptual similarity between audio clips and feature selection for its measurement.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Optimized weighted decoding for error-correcting output codes.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Efficient Multiple Kernel Support Vector Machine Based Voice Activity Detection.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2011

Maximum Margin Clustering Based Statistical VAD With Multiple Observation Compound Feature.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2011

An efficient voice activity detection algorithm by combining statistical model and energy detection.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2011

2010

A new VAD framework using statistical model and human knowledge based empirical rule.

[BibT_eX]

[DOI]