Yi Ren

Orcid: 0000-0002-9160-3848

Affiliations:

Zhejiang University, China

According to our database¹, Yi Ren authored at least 89 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions.

[BibT_eX]

[DOI]

CoRR, April, 2026

Generate Your Talking Avatar from Video Reference.

[BibT_eX]

[DOI]

CoRR, April, 2026

2025

InfinityHuman: Towards Long-Term Audio-Driven Human.

[BibT_eX]

[DOI]

CoRR, August, 2025

Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, February, 2025

HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation.

[BibT_eX]

[DOI]

CoRR, February, 2025

UniTalker: Conversational Speech-Visual Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Retinal vessels segmentation method based on dynamic threshold neural P systems with orientation feedback.

[BibT_eX]

[DOI]

J. Membr. Comput., December, 2024

SDMuse: Stochastic Differential Music Editing and Generation via Hybrid Representation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes.

[BibT_eX]

[DOI]

CoRR, 2024

MulliVC: Multi-lingual Voice Conversion With Cycle Consistency.

[BibT_eX]

[DOI]

CoRR, 2024

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition.

[BibT_eX]

[DOI]

CoRR, 2024

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Generative Expressive Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

EnchantDance: Unveiling the Potential of Music-Driven Dance Movement.

[BibT_eX]

[DOI]

CoRR, 2023

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2023

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts.

[BibT_eX]

[DOI]

CoRR, 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias.

[BibT_eX]

[DOI]

CoRR, 2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

Detector Guidance for Multi-Object Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation.

[BibT_eX]

[DOI]

CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

CoRR, 2023

Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Bag of Tricks for Unsupervised Text-to-Speech.

[BibT_eX]

[DOI]

Yi Ren

Chen Zhang

Shuicheng Yan

Proceedings of the Eleventh International Conference on Learning Representations, 2023

MUG: A General Meeting Understanding and Generation Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

VarietySound: Timbre-Controllable Video to Sound Generation Via Unsupervised Information Disentanglement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

FastDiff 2: Revisiting and Incorporating GANs and Diffusion Models in High-Fidelity Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection.

[BibT_eX]

[DOI]

CoRR, 2022

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement.

[BibT_eX]

[DOI]

CoRR, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder.

[BibT_eX]

[DOI]

CoRR, 2022

M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Study of Syntactic Multi-Modality in Non-Autoregressive Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Video-Guided Curriculum Learning for Spoken Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

EditSinger: Zero-Shot Text-Based Singing Voice Editing System with Diverse Prosody Modeling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Pseudo Numerical Methods for Diffusion Models on Manifolds.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

HiFiDenoise: High-Fidelity Denoising Text to Speech with Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Learning the Beauty in Songs: Neural Singing Voice Beautifier.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Revisiting Over-Smoothness in Text to Speech.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Parallel and High-Fidelity Text-to-Lip Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Flow-Based Unconstrained Lip to Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.

[BibT_eX]

[DOI]

CoRR, 2021

High-Speed and High-Quality Text-to-Lip Generation.

[BibT_eX]

[DOI]

CoRR, 2021

DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis.

[BibT_eX]

[DOI]

CoRR, 2021

PortaSpeech: Portable and High-Quality Generative Text-to-Speech.

[BibT_eX]

[DOI]

Yi Ren

Jinglin Liu

Zhou Zhao

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

WSRGlow: A Glow-Based Waveform Generative Model for Audio Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

FedSpeech: Federated Text-to-Speech with Continual Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

UWSpeech: Speech to Speech Translation for Unwritten Languages.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

PopMAG: Pop Music Accompaniment Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition.

[BibT_eX]

[DOI]

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

DeepSinger: Singing Voice Synthesis with Data Mined From the Web.

[BibT_eX]

[DOI]

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

MultiSpeech: Multi-Speaker Text to Speech with Transformer.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

A Study of Non-autoregressive Model for Sequence Generation.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

SimulSpeech: End-to-End Simultaneous Speech to Text Translation.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

A Study of Multilingual Neural Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2019

FastSpeech: Fast, Robust and Controllable Text to Speech.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Almost Unsupervised Text to Speech and Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Multilingual Neural Machine Translation with Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Yi Ren

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...