Xiaofei Wang

Affiliations:

Microsoft, One Microsoft Way, Redmond, WA, USA

According to our database¹, Xiaofei Wang authored at least 46 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates.

[BibT_eX]

[DOI]

CoRR, October, 2025

Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios.

[BibT_eX]

[DOI]

Aswin Shanmugam Subramanian

CoRR, June, 2025

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation.

[BibT_eX]

[DOI]

Aswin Shanmugam Subramanian

CoRR, February, 2025

Summary of the NOTSOFAR-1 challenge: Highlights and learnings.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2025

CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

SLM-S2ST: A multimodal language model for direct speech-to-speech translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages.

[BibT_eX]

[DOI]

CoRR, 2024

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like.

[BibT_eX]

[DOI]

CoRR, 2024

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Investigating Neural Audio Codecs For Speech Language Model-Based Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription.

[BibT_eX]

[DOI]

Benjamin Martinez Elizalde

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Diarist: Streaming Speech Translation with Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.

[BibT_eX]

[DOI]

CoRR, 2023

Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Separation with Large-Scale Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts.

[BibT_eX]

[DOI]

CoRR, 2022

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

All-Neural Beamformer for Continuous Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

VarArray: Array-Geometry-Agnostic Continuous Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Picknet: Real-Time Channel Selection for Ad Hoc Microphone Arrays.

[BibT_eX]

[DOI]

Takuya Yoshioka

Xiaofei Wang

Dongmei Wang

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Personalized speech enhancement: new models and Comprehensive evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speaker-Attributed ASR with Transformer.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Ad Hoc Microphone Arrays.

[BibT_eX]

[DOI]

Proceedings of the 29th European Signal Processing Conference, 2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Serialized Output Training for End-to-End Overlapped Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Xiaofei Wang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...