Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps.

[BibT_eX]

[DOI]

Huadai Liu

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructSpeech: Following Speech Editing Instructions via Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Robust Singing Voice Transcription Serves Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.

[BibT_eX]

[DOI]

CoRR, 2023

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers.

[BibT_eX]

[DOI]

CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias.

[BibT_eX]

[DOI]

CoRR, 2023

Detector Guidance for Multi-Object Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation.

[BibT_eX]

[DOI]

CoRR, 2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec.

[BibT_eX]

[DOI]

CoRR, 2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation.

[BibT_eX]

[DOI]

CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

CoRR, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.

[BibT_eX]

[DOI]

CoRR, 2023

UniSinger: Unified End-to-End Singing Voice Synthesis With Cross-Modality Information Matching.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

VarietySound: Timbre-Controllable Video to Sound Generation Via Unsupervised Information Disentanglement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Contrastive Token-Wise Meta-Learning for Unseen Performer Visual Temporal-Aligned Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

FastDiff 2: Revisiting and Incorporating GANs and Diffusion Models in High-Fidelity Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement.

[BibT_eX]

[DOI]

CoRR, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

2021

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.

[BibT_eX]

[DOI]

CoRR, 2021

Bilateral Denoising Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2021

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Rongjie Huang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...