Lei He

Affiliations:
  • Microsoft China, Speech and Language Group, Beijing, China


According to our database1, Lei He authored at least 58 papers between 2014 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers.
CoRR, 2023

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.
CoRR, 2023

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling.
CoRR, 2023

FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model.
CoRR, 2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.
CoRR, 2023

LeanSpeech: The Microsoft Lightweight Speech Synthesis System for Limmits Challenge 2023.
Proceedings of the IEEE International Conference on Acoustics, 2023

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Prosody-Aware Speecht5 for Expressive Neural TTS.
Proceedings of the IEEE International Conference on Acoustics, 2023

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech.
CoRR, 2022

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.
CoRR, 2022

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality.
CoRR, 2022

SoftSpeech: Unsupervised Duration Model in FastSpeech 2.
Proceedings of the Interspeech 2022, 2022

Self-supervised Context-aware Style Representation for Expressive Speech Synthesis.
Proceedings of the Interspeech 2022, 2022

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.
Proceedings of the Interspeech 2022, 2022

DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.
Proceedings of the Interspeech 2022, 2022

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge.
Proceedings of the Interspeech 2022, 2022

Exploring Machine Speech Chain For Domain Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Prosodyspeech: Towards Advanced Prosody Model for Neural Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Fastspeech TTS with Efficient Self-Attention and Compact Feed-Forward Network.
Proceedings of the IEEE International Conference on Acoustics, 2022

Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Cycle consistent network for end-to-end style transfer TTS training.
Neural Networks, 2021

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021.
CoRR, 2021

Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation.
CoRR, 2021

Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis.
CoRR, 2021

Conversational End-to-End TTS for Voice Agents.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Cross-Speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Speech Bert Embedding for Improving Prosody in Neural TTS.
Proceedings of the IEEE International Conference on Acoustics, 2021

On Addressing Practical Challenges for RNN-Transducer.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis.
CoRR, 2020

Conversational End-to-End TTS for Voice Agent.
CoRR, 2020

On Early-stop Clustering for Speaker Diarization.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability.
Proceedings of the Interspeech 2020, 2020

An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.
Proceedings of the Interspeech 2020, 2020

Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator.
Proceedings of the Interspeech 2020, 2020

Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Feature reinforcement with word embedding and parsing information in neural TTS.
CoRR, 2019

Forward-Backward Decoding for Regularizing End-to-End TTS.
Proceedings of the Interspeech 2019, 2019

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS.
Proceedings of the Interspeech 2019, 2019

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS.
Proceedings of the Interspeech 2019, 2019

A New GAN-Based End-to-End TTS Training Algorithm.
Proceedings of the Interspeech 2019, 2019

Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice.
CoRR, 2018

Frame Selection in SI-DNN Phonetic Space with WaveNet Vocoder for Voice Conversion without Parallel Training Data.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

A New Glottal Neural Vocoder for Speech Synthesis.
Proceedings of the Interspeech 2018, 2018

2016
Modeling F0 trajectories in hierarchically structured deep neural networks.
Speech Commun., 2016

Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network.
Proceedings of the NAACL HLT 2016, 2016

Speaker and language factorization in DNN-based TTS synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Unsupervised speaker adaptation for DNN-based TTS synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding.
CoRR, 2015

Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network.
CoRR, 2015

Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis.
Proceedings of the INTERSPEECH 2015, 2015

Word embedding for recurrent neural network based TTS synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree.
Proceedings of the INTERSPEECH 2014, 2014


  Loading...