Xiao-Lei Zhang

Orcid: 0000-0001-7694-193X

Affiliations:
  • Northwestern Polytechnical University, Center for Intelligent Acoustics and Immersive Communications, CIAIC, School of Marine Science and Technology, China
  • Tsinghua University, Department of Electronic Engineering, Beijing, China
  • Ohio State University, Department of Computer Science and Engineering, Columbus, OH, USA (2013-2014)
  • Tsinghua University, Department of Information and Communication Engineering, Beijing, China (PhD 2012)


According to our database1, Xiao-Lei Zhang authored at least 84 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Interpretable Spectrum Transformation Attacks to Speaker Recognition Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Transformer-Based End-to-End Speech Translation With Rotary Position Embedding.
IEEE Signal Process. Lett., 2024

2023
End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Symmetric Saliency-Based Adversarial Attack to Speaker Identification.
IEEE Signal Process. Lett., 2023

Deep NMF topic modeling.
Neurocomputing, 2023

Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays.
CoRR, 2023

Interpretable Spectrum Transformation Attacks to Speaker Recognition.
CoRR, 2023

Wekws: A Production First Small-Footprint End-to-End Keyword Spotting Toolkit.
Proceedings of the IEEE International Conference on Acoustics, 2023

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Deep ad-hoc beamforming based on speaker extraction for target-dependent speech separation.
Speech Commun., 2022

AUC optimization for deep learning-based voice activity detection.
EURASIP J. Audio Speech Music. Process., 2022

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification.
CoRR, 2022

Symmetric Saliency-based Adversarial Attack To Speaker Identification.
CoRR, 2022

Deep Learning Based Two-dimensional Speaker Localization With Large Ad-hoc Microphone Arrays.
CoRR, 2022

End-to-end Two-dimensional Sound Source Localization With Ad-hoc Microphone Arrays.
CoRR, 2022

Multi-modal emotion recognition using EEG and speech signals.
Comput. Biol. Medicine, 2022

Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data.
Proceedings of the Interspeech 2022, 2022

Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays.
Proceedings of the Interspeech 2022, 2022

End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Minimum-Volume Multichannel Nonnegative Matrix Factorization for Blind Audio Source Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Speaker recognition based on deep learning: An overview.
Neural Networks, 2021

Deep ad-hoc beamforming.
Comput. Speech Lang., 2021

Frame-level multi-channel speaker verification with large-scale ad-hoc microphone arrays.
CoRR, 2021

AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data.
CoRR, 2021

Unsupervised Ensemble Selection for Multilayer Bootstrap Networks.
CoRR, 2021

Minimum-volume Multichannel Nonnegative matrix factorization for blind source separation.
CoRR, 2021

Scaling Sparsemax Based Channel Selection for Speech Recognition with ad-hoc Microphone Arrays.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Transformer-Based End-to-End Speech Recognition with Local Dense Synthesizer Attention.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Enhancement Aided End-To-End Multi-Task Learning for Voice Activity Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021

A comparison of handcrafted, parameterized, and learnable features for speech separation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Minimum-volume regularized ILRMA for blind audio source separation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Attention-based multi-channel speaker verification with ad-hoc microphone arrays.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Efficient conformer-based speech recognition with linear attention.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Conformer-based End-to-end Speech Recognition With Rotary Position Embedding.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Libri-adhoc40: A dataset collected from synchronized ad-hoc microphone arrays.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Speaker Verification by Partial AUC Optimization With Mahalanobis Distance Metric Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Cosine metric learning based speaker verification.
Speech Commun., 2020

Partial AUC Metric Learning Based Speaker Verification Back-End.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-Footprint Keyword Spotting.
Proceedings of the Interspeech 2020, 2020

Deep Topic Modeling by Multilayer Bootstrap Network and Lasso.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Multi-channel Speech Separation Using Deep Embedding With Multilayer Bootstrap Networks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Phase-Aware Speech Enhancement Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Multi-channel Speech Separation Using Deep Embedding Model with Multilayer Bootstrap Networks.
CoRR, 2019

Robust Sparse Multichannel Active Noise Control.
Proceedings of the IEEE International Conference on Acoustics, 2019

AUC Optimization for Deep Learning Based Voice Activity Detection.
Proceedings of the IEEE International Conference on Acoustics, 2019

Boosting Spatial Information for Deep Learning Based Multichannel Speaker-Independent Speech Separation In Reverberant Environments.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Hybrid Constant-Q Transform Based CNN Ensemble for Acoustic Scene Classification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Multilayer bootstrap networks.
Neural Networks, 2018

Linear Regression for Speaker Verification.
CoRR, 2018

Cosine Metric Learning for Speaker Verification in the I-vector Space.
Proceedings of the Interspeech 2018, 2018

An Investigation of Speaker Clustering Algorithms in Adverse Acoustic Environments.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
Learning the kernel matrix by resampling.
CoRR, 2017

Speech separation by cost-sensitive deep learning.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
A Deep Ensemble Learning Method for Monaural Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker Clustering.
Proceedings of the Interspeech 2016, 2016

2015
Heuristic Ternary Error-Correcting Output Codes Via Weight Optimization and Layered Clustering-Based Approach.
IEEE Trans. Cybern., 2015

Convex Discriminative Multitask Clustering.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker Recognition.
CoRR, 2015

Multilayer bootstrap network for unsupervised speaker recognition.
CoRR, 2015

Unsupervised model compression for multilayer bootstrap networks.
CoRR, 2015

Multi-resolution stacking for speech separation based on boosted DNN.
Proceedings of the INTERSPEECH 2015, 2015

2014
Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection.
Proceedings of the INTERSPEECH 2014, 2014

Unsupervised domain adaptation for deep neural network based voice activity detection.
Proceedings of the IEEE International Conference on Acoustics, 2014

Nonlinear Dimensionality Reduction of Data by Deep Distributed Random Samplings.
Proceedings of the Sixth Asian Conference on Machine Learning, 2014

2013
Deep Belief Networks Based Voice Activity Detection.
IEEE Trans. Speech Audio Process., 2013

Heuristic Ternary Error-Correcting Output Codes Via Weight Optimization and Layered Clustering-Based Approach
CoRR, 2013

Convex Discriminative Multitask Clustering
CoRR, 2013

Transfer Learning for Voice Activity Detection: A Denoising Deep Neural Network Perspective
CoRR, 2013

Learning Deep Representations By Distributed Random Samplings.
CoRR, 2013

Weight optimization and layered clustering-based ECOC.
Proceedings of the IEEE International Conference on Acoustics, 2013

Denoising deep neural networks based voice activity detection.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Linearithmic Time Sparse and Convex Maximum Margin Clustering.
IEEE Trans. Syst. Man Cybern. Part B, 2012

Secure Ranking over Encrypted Documents.
IEICE Trans. Inf. Syst., 2012

Perceptual similarity between audio clips and feature selection for its measurement.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Optimized weighted decoding for error-correcting output codes.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Efficient Multiple Kernel Support Vector Machine Based Voice Activity Detection.
IEEE Signal Process. Lett., 2011

Maximum Margin Clustering Based Statistical VAD With Multiple Observation Compound Feature.
IEEE Signal Process. Lett., 2011

An efficient voice activity detection algorithm by combining statistical model and energy detection.
EURASIP J. Adv. Signal Process., 2011

2010
A new VAD framework using statistical model and human knowledge based empirical rule.
Proceedings of the INTERSPEECH 2010, 2010


  Loading...