Di Wu

Affiliations:

Horizon Robotics, Beijing, China
WeNet Open Source Community
Mobvoi Inc., Beijing, China

According to our database¹, Di Wu authored at least 19 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis.

[BibT_eX]

[DOI]

CoRR, April, 2026

Iterate to Differentiate: Enhancing Discriminability and Reliability in Zero-Shot TTS Evaluation.

[BibT_eX]

[DOI]

CoRR, March, 2026

Borderless Long Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model.

[BibT_eX]

[DOI]

CoRR, December, 2025

2024

TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch.

[BibT_eX]

[DOI]

CoRR, 2024

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch.

[BibT_eX]

[DOI]

CoRR, 2024

U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF.

[BibT_eX]

[DOI]

CoRR, 2024

Hydraformer: One Encoder for All Subsampling Rates.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

TrimTail: Low-Latency Streaming ASR with Simple But Effective Spectrogram-Level Length Penalty.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

CoRR, 2021

WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

2019

Design of Gesture Recognition System Based on Multi-Channel Myoelectricity Correlation.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Global Communications Conference, 2019

Di Wu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...