Shi-Xiong Zhang

According to our database1, Shi-Xiong Zhang authored at least 68 papers between 2007 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
SECap: Speech Emotion Captioning with Large Language Model.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR.
CoRR, 2023

M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec.
CoRR, 2023

3D Neural Beamforming for Multi-channel Speech Separation Against Location Uncertainty.
CoRR, 2023

MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Deep Neural Mel-Subband Beamformer for in-Car Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Neuralecho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network For Acoustic Echo Cancellation and Speech Enhancement.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement.
CoRR, 2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Joint Neural AEC and Beamforming with Double-Talk Detection.
Proceedings of the Interspeech 2022, 2022

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
Proceedings of the IEEE International Conference on Acoustics, 2022

Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI.
Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature.
Proceedings of the IEEE International Conference on Acoustics, 2022

Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Complex Neural Spatial Filter: Enhancing Multi-Channel Target Speech Separation in Complex Domain.
IEEE Signal Process. Lett., 2021

Joint AEC AND Beamforming with Double-Talk Detection using RNN-Transformer.
CoRR, 2021

Generalized RNN beamformer for target speech separation.
CoRR, 2021

WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Neural Mask based Multi-channel Convolutional Beamforming for Joint Dereverberation, Echo Cancellation and Denoising.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Multi-Channel Speaker Verification for Single and Multi-Talker Speech.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

ADL-MVDR: All Deep Learning MVDR Beamformer for Target Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
Proceedings of the IEEE International Conference on Acoustics, 2021

3D Spatial Features for Multi-Channel Target Speech Separation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network.
IEEE J. Sel. Top. Signal Process., 2020

Multi-Modal Multi-Channel Target Speech Separation.
IEEE J. Sel. Top. Signal Process., 2020

Audio-Visual Multi-Channel Recognition of Overlapped Speech.
Proceedings of the Interspeech 2020, 2020

Neural Spatio-Temporal Beamformer for Target Speech Separation.
Proceedings of the Interspeech 2020, 2020

Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Self-Supervised Learning for Audio-Visual Speaker Diarization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
A Unified Framework for Speech Separation.
CoRR, 2019

End-to-End Multi-Channel Speech Separation.
CoRR, 2019

Improved Speaker-Dependent Separation for CHiME-5 Challenge.
Proceedings of the Interspeech 2019, 2019

Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information.
Proceedings of the Interspeech 2019, 2019

A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation.
Proceedings of the Interspeech 2019, 2019

Encrypted Speech Recognition Using Deep Polynomial Networks.
Proceedings of the IEEE International Conference on Acoustics, 2019

Time Domain Audio Visual Speech Separation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Domain and Speaker Adaptation for Cortana Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Exploring Sequential Characteristics in Speaker Bottleneck Feature for Text-Dependent Speaker Verification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Challenges in and Solutions to Deep Learning Network Acoustic Modeling in Speech Recognition Products at Microsoft.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
End-to-End attention based text-dependent speaker verification.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Recurrent support vector machines for speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Simplifying long short-term memory acoustic models for fast training and decoding.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Deep neural support vector machines for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Infinite structured support vector machines for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Structured SVMs for Automatic Speech Recognition.
IEEE Trans. Speech Audio Process., 2013

Kernelized log linear models for continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Investigation of multilingual deep neural networks for spoken term detection.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2011
Optimized Discriminative Kernel for SVM Scoring and Its Application to Speaker Verification.
IEEE Trans. Neural Networks, 2011

Structured Support Vector Machines for Noise Robust Continuous Speech Recognition.
Proceedings of the INTERSPEECH 2011, 2011

Extending noise robust structured support vector machines to larger vocabulary tasks.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
Structured Log Linear Models for Noise Robust Speech Recognition.
IEEE Signal Process. Lett., 2010

2009
A new adaptation approach to high-level speaker-model creation in speaker verification.
Speech Commun., 2009

Optimization of discriminative kernels in SVM speaker verification.
Proceedings of the INTERSPEECH 2009, 2009

2008
High-level speaker verification via articulatory-feature based sequence kernels and SVM.
Proceedings of the INTERSPEECH 2008, 2008

2007
Speaker Verification via High-Level Feature Based Phonetic-Class Pronunciation Modeling.
IEEE Trans. Computers, 2007

A New Adaptation Method for Speaker-Model Creation in High-Level Speaker Verification.
Proceedings of the Advances in Multimedia Information Processing, 2007

High-level feature-based speaker verification via articulatory phonetic-class pronunciation modeling.
Proceedings of the INTERSPEECH 2007, 2007


  Loading...