Zhuo Chen

Orcid: 0000-0003-0563-1760

Affiliations:
  • Microsoft, Redmond, WA, USA
  • Columbia University, New York, NY, USA (PhD 2017)


According to our database1, Zhuo Chen authored at least 103 papers between 2015 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning.
CoRR, 2023

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability.
CoRR, 2023

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
CoRR, 2023

On decoder-only architecture for speech-to-text and large language model integration.
CoRR, 2023

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.
CoRR, 2023

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation.
CoRR, 2023

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling.
CoRR, 2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.
CoRR, 2023

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

BEATs: Audio Pre-Training with Acoustic Tokenizers.
Proceedings of the International Conference on Machine Learning, 2023

Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speaker Change Detection For Transformer Transducer ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2023

Target Sound Extraction with Variable Cross-Modality Clues.
Proceedings of the IEEE International Conference on Acoustics, 2023

Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Separation with Large-Scale Self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
IEEE J. Sel. Top. Signal Process., 2022

BEATs: Audio Pre-Training with Acoustic Tokenizers.
CoRR, 2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts.
CoRR, 2022

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition.
CoRR, 2022

Speech separation with large-scale self-supervised learning.
CoRR, 2022

The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022.
CoRR, 2022

Exploring WavLM on Speech Enhancement.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.
Proceedings of the Interspeech 2022, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Proceedings of the Interspeech 2022, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Proceedings of the Interspeech 2022, 2022

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Proceedings of the Interspeech 2022, 2022

All-Neural Beamformer for Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Speech Separation with Recurrent Selective Attention Network.
Proceedings of the IEEE International Conference on Acoustics, 2022

VarArray: Array-Geometry-Agnostic Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.
Proceedings of the IEEE International Conference on Acoustics, 2022

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

Personalized speech enhancement: new models and Comprehensive evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Speaker Separation Using Speaker Inventories and Estimated Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
CoRR, 2021

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dual-Path RNN for Long Recording Speech Separation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

End-to-End Speaker-Attributed ASR with Transformer.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Continuous Speech Separation Using Speaker Inventory for Long Recording.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Ultra Fast Speech Separation Model with Teacher Student Learning.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings.
Proceedings of the IEEE International Conference on Acoustics, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Ad Hoc Microphone Arrays.
Proceedings of the 29th European Signal Processing Conference, 2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Toward Intelligent Sensing: Intermediate Deep Feature Compression.
IEEE Trans. Image Process., 2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording.
CoRR, 2020

Rethinking the Separation Layers in Speech Separation Networks.
CoRR, 2020

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer.
CoRR, 2020

Continuous Speech Separation with Conformer.
CoRR, 2020

Continuous speech separation: dataset and analysis.
CoRR, 2020

An End-to-End Architecture of Online Multi-Channel Speech Separation.
Proceedings of the Interspeech 2020, 2020

Neural Speech Separation Using Spatially Distributed Microphones.
Proceedings of the Interspeech 2020, 2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.
Proceedings of the Interspeech 2020, 2020

Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Continuous Speech Separation: Dataset and Analysis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch.
CoRR, 2019

Meeting Transcription Using Virtual Microphone Arrays.
CoRR, 2019

Meeting Transcription Using Asynchronous Distant Microphones.
Proceedings of the Interspeech 2019, 2019

Low-latency Speaker-independent Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Single-channel Speech Extraction Using Speaker Inventory and Attention Network.
Proceedings of the IEEE International Conference on Acoustics, 2019


Speech Separation Using Speaker Inventory.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Speaker-Independent Speech Separation With Deep Attractor Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Speaker-Invariant Training via Adversarial Learning.
CoRR, 2018

Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks.
Proceedings of the Interspeech 2018, 2018

Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speaker-Invariant Training Via Adversarial Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Developing Far-Field Speaker System Via Teacher-Student Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-Channel Far-Field Speech Separation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Single Channel auditory source separation with neural network.
PhD thesis, 2017

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend.
Comput. Speech Lang., 2017

Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection.
Proceedings of the Interspeech 2017, 2017

Deep clustering and conventional networks for music separation: Stronger together.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep attractor network for single-microphone speaker separation.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Unsupervised adaptation with domain separation networks for robust speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Cracking the cocktail party problem by multi-beam deep attractor network.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Novel Deep Architectures in Speech Processing.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
End-to-End attention based text-dependent speaker verification.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Single-Channel Multi-Speaker Separation Using Deep Clustering.
Proceedings of the Interspeech 2016, 2016

Deep clustering: Discriminative embeddings for segmentation and separation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks.
Proceedings of the INTERSPEECH 2015, 2015

Robust speech recognition in unknown reverberant and noisy conditions.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015


  Loading...