Di Wu

Affiliations:
  • Horizon Robotics, Beijing, China
  • WeNet Open Source Community
  • Mobvoi Inc., Beijing, China


According to our database1, Di Wu authored at least 19 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis.
CoRR, April, 2026

Iterate to Differentiate: Enhancing Discriminability and Reliability in Zero-Shot TTS Evaluation.
CoRR, March, 2026

Borderless Long Speech Synthesis.
CoRR, March, 2026

2025
SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model.
CoRR, December, 2025

2024
TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch.
CoRR, 2024

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch.
CoRR, 2024

U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF.
CoRR, 2024

Hydraformer: One Encoder for All Subsampling Rates.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

TrimTail: Low-Latency Streaming ASR with Simple But Effective Spectrogram-Level Length Penalty.
Proceedings of the IEEE International Conference on Acoustics, 2023

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition.
CoRR, 2022

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition.
CoRR, 2021

WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit.
CoRR, 2021

WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition.
CoRR, 2020

2019
Design of Gesture Recognition System Based on Multi-Channel Myoelectricity Correlation.
Proceedings of the 2019 IEEE Global Communications Conference, 2019


  Loading...