Yuancheng Wang

Orcid: 0000-0003-2382-3424

According to our database1, Yuancheng Wang authored at least 29 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations.
CoRR, August, 2025

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation.
CoRR, May, 2025

Metis: A Foundation Speech Generation Model with Masked Generative Pre-training.
CoRR, February, 2025

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation.
CoRR, January, 2025

Overview of the Amphion Toolkit (v0.2).
CoRR, January, 2025

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement.
CoRR, January, 2025

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Trustworthy multi-phase liver tumor segmentation via evidence-based uncertainty.
Eng. Appl. Artif. Intell., 2024

Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities.
CoRR, 2024

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer.
CoRR, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds.
CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.
CoRR, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
CoRR, 2024

Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset For Large-Scale Speech Generation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Cross-Lingual Alzheimer's Disease Detection Based on Scale Criteria.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit.
CoRR, 2023

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Pipaset preview: A multimodal dataset dedicated to Chinese music instrument Pipa.
Dataset, April, 2022

PipaSet and TEAS: A Multimodal Dataset and Annotation Platform for Automatic Music Transcription and Expressive Analysis Dedicated to Chinese Traditional Plucked String Instrument Pipa.
IEEE Access, 2022

Automated testing of image captioning systems.
Proceedings of the ISSTA '22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18, 2022

Mining Assignment Submission Time to Detect At-Risk Students with Peer Information.
Proceedings of the 15th International Conference on Educational Data Mining, 2022

2019
Adversarial Training for Video Disentangled Representation.
Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

2018
An Attention-Based Approach for Single Image Super Resolution.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

2017
Calibration of a two-state pitch-wise HMM method for note segmentation in Automatic Music Transcription systems.
CoRR, 2017

Improving Note Segmentation in Automatic Piano Music Transcription Systems with a Two-State Pitch-Wise HMM Method.
Proceedings of the 18th International Society for Music Information Retrieval Conference, 2017


  Loading...