Yi Zhao

Orcid: 0000-0002-3555-9408

Affiliations:
  • Kuaishou Technology, Beijing, China
  • National Institute of Informatics (NII), Tokyo, Japan
  • University of Tokyo, Graduate School of Engineering, Tokyo, Japan


According to our database1, Yi Zhao authored at least 21 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations.
CoRR, 2024

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing.
CoRR, 2024

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-Processing.
Proceedings of the IEEE International Conference on Acoustics, 2024

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

GFMAE: Self-Supervised GNN-Free Masked Autoencoders.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
HoloSinger: Semantics and Music Driven Motion Generation with Octahedral Holographic Projection.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

2022
Fusion of Self-supervised Learned Models for MOS Prediction.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Investigating Effective Domain Adaptation Method for Speaker Verification Task.
Proceedings of the Neural Information Processing - 29th International Conference, 2022

Melons: Generating Melody With Long-Term Structure Using Transformers And Structure Graph.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis.
CoRR, 2020

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Voice Conversion Challenge 2020 -- Intra-lingual semi-parallel and cross-lingual voice conversion --.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019
Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018
Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder.
IEEE Access, 2018

2016
Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-Based Speech Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The UTokyo System for Blizzard Challenge 2016.
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

2013
On the design of digital base-band processing unit for dPMR system.
Proceedings of the 15th IEEE International Conference on Communication Technology, 2013


  Loading...