Yi Zhao

Orcid: 0000-0002-3555-9408

Affiliations:

Kuaishou Technology, Beijing, China
National Institute of Informatics (NII), Tokyo, Japan
University of Tokyo, Graduate School of Engineering, Tokyo, Japan

According to our database¹, Yi Zhao authored at least 23 papers between 2013 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Simple and Effective Content Encoder for Singing Voice Conversion via SSL-Embedding Dimension Reduction.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

InvoxSVC: Any-to-any Zero-shot Singing Voice Conversion with In-Context Learning in Latent Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

2024

Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations.

[BibT_eX]

[DOI]

CoRR, 2024

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing.

[BibT_eX]

[DOI]

CoRR, 2024

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

GFMAE: Self-Supervised GNN-Free Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

HoloSinger: Semantics and Music Driven Motion Generation with Octahedral Holographic Projection.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

2022

Fusion of Self-supervised Learned Models for MOS Prediction.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Investigating Effective Domain Adaptation Method for Speaker Verification Task.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 29th International Conference, 2022

Melons: Generating Melody With Long-Term Structure Using Transformers And Structure Graph.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2020

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Voice Conversion Challenge 2020 -- Intra-lingual semi-parallel and cross-lingual voice conversion --.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019

Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder.

[BibT_eX]

[DOI]

IEEE Access, 2018

2016

Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-Based Speech Synthesis.

[BibT_eX]

[DOI]

Yi Zhao

Daisuke Saito

Nobuaki Minematsu

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The UTokyo System for Blizzard Challenge 2016.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

2013

On the design of digital base-band processing unit for dPMR system.

[BibT_eX]

[DOI]

Zhen Yang

Yi Zhao

Xiaokang Lin

Proceedings of the 15th IEEE International Conference on Communication Technology, 2013

Yi Zhao

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...