Yong Xu

Orcid: 0000-0003-4944-6890

Affiliations:
  • Tencent America LLC, Seattle, USA
  • University of Surrey, Centre for Vision, Speech and Signal Processing, Guildford, UK (former)
  • University of Science and Technology of China, Hefei, China (PhD 2015)


According to our database1, Yong Xu authored at least 78 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Labelled Non-Zero Diffusion Particle Flow SMC-PHD Filtering for Multi-Speaker Tracking.
IEEE Trans. Multim., 2024

2023
SpatialCodec: Neural Spatial Speech Coding.
CoRR, 2023

Deep Neural Mel-Subband Beamformer for in-Car Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Neuralecho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network For Acoustic Echo Cancellation and Speech Enhancement.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement.
CoRR, 2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter.
Proceedings of the Interspeech 2022, 2022

Joint Neural AEC and Beamforming with Double-Talk Detection.
Proceedings of the Interspeech 2022, 2022

Audio-Visual Tracking of Multiple Speakers Via a PMBM Filter.
Proceedings of the IEEE International Conference on Acoustics, 2022

Visually Assisted Self-supervised Audio Speaker Localization and Tracking.
Proceedings of the 30th European Signal Processing Conference, 2022

2021
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Joint AEC AND Beamforming with Double-Talk Detection using RNN-Transformer.
CoRR, 2021

Generalized RNN beamformer for target speech separation.
CoRR, 2021

WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Neural Mask based Multi-channel Convolutional Beamforming for Joint Dereverberation, Echo Cancellation and Denoising.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

ADL-MVDR: All Deep Learning MVDR Beamformer for Target Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network.
IEEE J. Sel. Top. Signal Process., 2020

Multi-Modal Multi-Channel Target Speech Separation.
IEEE J. Sel. Top. Signal Process., 2020

Audio-Visual Multi-Channel Recognition of Overlapped Speech.
Proceedings of the Interspeech 2020, 2020

Neural Spatio-Temporal Beamformer for Target Speech Separation.
Proceedings of the Interspeech 2020, 2020

Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Self-Supervised Learning for Audio-Visual Speaker Diarization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Weakly Labelled AudioSet Tagging With Attention Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

A Unified Framework for Speech Separation.
CoRR, 2019

End-to-End Multi-Channel Speech Separation.
CoRR, 2019

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems.
CoRR, 2019

Weakly labelled AudioSet Classification with Attention Neural Networks.
CoRR, 2019

Improved Speaker-Dependent Separation for CHiME-5 Challenge.
Proceedings of the Interspeech 2019, 2019

Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information.
Proceedings of the Interspeech 2019, 2019

A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation.
Proceedings of the Interspeech 2019, 2019

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Joint Training of Complex Ratio Mask Based Beamformer and Acoustic Model for Noise Robust Asr.
Proceedings of the IEEE International Conference on Acoustics, 2019

Acoustic Scene Generation with Conditional Samplernn.
Proceedings of the IEEE International Conference on Acoustics, 2019

Time Domain Audio Visual Speech Separation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition.
J. Signal Process. Syst., 2018

DCASE 2018 Challenge baseline with convolutional neural networks.
CoRR, 2018

Iterative Deep Neural Networks for Speaker-Independent Binaural Blind Speech Separation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Joint Separation-Classification Model for Sound Event Detection of Weakly Labelled Data.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Audio Set Classification with Attention Model: A Probabilistic Perspective.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Improving Reverberant Speech Separation with Binaural Cues Using Temporal Context and Convolutional Neural Networks.
Proceedings of the Latent Variable Analysis and Signal Separation, 2018

Supporting Audiography: Design of a System for Sentimental Sound Recording, Classification and Playback.
Proceedings of the HCI International 2018, 2018

Capsule Routing for Sound Event Detection.
Proceedings of the 26th European Signal Processing Conference, 2018

DCASE 2018 Challenge Surrey cross-task convolutional neural network baseline.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2018

2017
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Hierarchical deep neural network for multivariate regression.
Pattern Recognit., 2017

Surrey-cvssp system for DCASE2017 challenge task4.
CoRR, 2017

Binaural and log-power spectra features with deep neural networks for speech-noise separation.
Proceedings of the 19th IEEE International Workshop on Multimedia Signal Processing, 2017

Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging.
Proceedings of the Interspeech 2017, 2017

Convolutional gated recurrent neural network incorporating spatial features for audio tagging.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

A joint detection-classification model for audio tagging of weakly labelled data.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Fast tagging of natural sounds using marginal co-regularization.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Joint detection and classification convolutional neural network on weakly labelled bird audio detection.
Proceedings of the 25th European Signal Processing Conference, 2017

2016
Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition.
EURASIP J. Adv. Signal Process., 2016

Hierachical learning for DNN-based acoustic scene classification.
CoRR, 2016

Fully Deep Neural Networks Incorporating Unsupervised Feature Learning for Audio Tagging.
CoRR, 2016

Deep neural network for robust speech recognition with auxiliary features from laser-Doppler vibrometer sensor.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Hierarchical Learning for DNN-Based Acoustic Scene Classification.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

Fully DNN-Based Multi-Label Regression for Audio Tagging.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

2015
A Regression Approach to Speech Enhancement Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement.
Proceedings of the INTERSPEECH 2015, 2015

DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech.
Proceedings of the INTERSPEECH 2015, 2015

Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments.
Proceedings of the Latent Variable Analysis and Signal Separation, 2015

2014
An Experimental Study on Speech Enhancement Based on Deep Neural Networks.
IEEE Signal Process. Lett., 2014

Cross-language transfer learning for deep neural network based speech enhancement.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Dynamic noise aware training for speech enhancement based on deep neural networks.
Proceedings of the INTERSPEECH 2014, 2014

Robust speech recognition with speech enhanced deep neural networks.
Proceedings of the INTERSPEECH 2014, 2014

Global variance equalization for improving deep neural network based speech enhancement.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014


  Loading...