Yi Ren

Affiliations:
  • Zhejiang University, China


According to our database1, Yi Ren authored at least 75 papers between 2019 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.
CoRR, 2024

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
EnchantDance: Unveiling the Potential of Music-Driven Dance Movement.
CoRR, 2023

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling.
CoRR, 2023

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model.
CoRR, 2023

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts.
CoRR, 2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech.
CoRR, 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias.
CoRR, 2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis.
CoRR, 2023

Detector Guidance for Multi-Object Text-to-Image Generation.
CoRR, 2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation.
CoRR, 2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation.
CoRR, 2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation.
CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
CoRR, 2023

Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.
Proceedings of the International Conference on Machine Learning, 2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Bag of Tricks for Unsupervised Text-to-Speech.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

MUG: A General Meeting Understanding and Generation Benchmark.
Proceedings of the IEEE International Conference on Acoustics, 2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
Proceedings of the IEEE International Conference on Acoustics, 2023

VarietySound: Timbre-Controllable Video to Sound Generation Via Unsupervised Information Disentanglement.
Proceedings of the IEEE International Conference on Acoustics, 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

FastDiff 2: Revisiting and Incorporating GANs and Diffusion Models in High-Fidelity Speech Synthesis.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection.
CoRR, 2022

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement.
CoRR, 2022

SDMuse: Stochastic Differential Music Editing and Generation via Hybrid Representation.
CoRR, 2022

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech.
CoRR, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis.
CoRR, 2022

MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder.
CoRR, 2022

M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Study of Syntactic Multi-Modality in Non-Autoregressive Machine Translation.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Video-Guided Curriculum Learning for Spoken Video Grounding.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

EditSinger: Zero-Shot Text-Based Singing Voice Editing System with Diverse Prosody Modeling.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Pseudo Numerical Methods for Diffusion Models on Manifolds.
Proceedings of the Tenth International Conference on Learning Representations, 2022

HiFiDenoise: High-Fidelity Denoising Text to Speech with Adversarial Networks.
Proceedings of the IEEE International Conference on Acoustics, 2022

Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

Learning the Beauty in Songs: Neural Singing Voice Beautifier.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Revisiting Over-Smoothness in Text to Speech.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Parallel and High-Fidelity Text-to-Lip Generation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Flow-Based Unconstrained Lip to Speech Generation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.
CoRR, 2021

High-Speed and High-Quality Text-to-Lip Generation.
CoRR, 2021

DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis.
CoRR, 2021

PortaSpeech: Portable and High-Quality Generative Text-to-Speech.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

WSRGlow: A Glow-Based Waveform Generative Model for Audio Super-Resolution.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

FedSpeech: Federated Text-to-Speech with Continual Learning.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
Proceedings of the 9th International Conference on Learning Representations, 2021

Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2021

UWSpeech: Speech to Speech Translation for Unwritten Languages.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
PopMAG: Pop Music Accompaniment Generation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

DeepSinger: Singing Voice Synthesis with Data Mined From the Web.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

MultiSpeech: Multi-Speaker Text to Speech with Transformer.
Proceedings of the Interspeech 2020, 2020

Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

A Study of Non-autoregressive Model for Sequence Generation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

SimulSpeech: End-to-End Simultaneous Speech to Text Translation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
A Study of Multilingual Neural Machine Translation.
CoRR, 2019

FastSpeech: Fast, Robust and Controllable Text to Speech.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Almost Unsupervised Text to Speech and Automatic Speech Recognition.
Proceedings of the 36th International Conference on Machine Learning, 2019

Multilingual Neural Machine Translation with Knowledge Distillation.
Proceedings of the 7th International Conference on Learning Representations, 2019


  Loading...