Rongjie Huang

Orcid: 0000-0002-1695-9000

According to our database1, Rongjie Huang authored at least 46 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt.
CoRR, 2024

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models.
CoRR, 2024

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.
CoRR, 2024

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.
CoRR, 2023

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers.
CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.
CoRR, 2023

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer.
CoRR, 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias.
CoRR, 2023

Detector Guidance for Multi-Object Text-to-Image Generation.
CoRR, 2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation.
CoRR, 2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation.
CoRR, 2023

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing.
CoRR, 2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec.
CoRR, 2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation.
CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
CoRR, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.
CoRR, 2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.
CoRR, 2023

UniSinger: Unified End-to-End Singing Voice Synthesis With Cross-Modality Information Matching.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.
Proceedings of the International Conference on Machine Learning, 2023

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

VarietySound: Timbre-Controllable Video to Sound Generation Via Unsupervised Information Disentanglement.
Proceedings of the IEEE International Conference on Acoustics, 2023

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Contrastive Token-Wise Meta-Learning for Unseen Performer Visual Temporal-Aligned Translation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

FastDiff 2: Revisiting and Incorporating GANs and Diffusion Models in High-Fidelity Speech Synthesis.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement.
CoRR, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis.
CoRR, 2022

Boundary element analysis of thin structures using a dual transformation method for weakly singular boundary integrals.
Comput. Math. Appl., 2022

M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

2021
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.
CoRR, 2021

Bilateral Denoising Diffusion Models.
CoRR, 2021

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

2017
Research on Dynamic Safe Loading Techniques in Android Application Protection System.
Proceedings of the Smart Computing and Communication, 2017


  Loading...