Yuancheng Wang

Orcid: 0000-0003-2382-3424

According to our database1, Yuancheng Wang authored at least 40 papers between 2017 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora.
CoRR, April, 2026

Scaling Speech Tokenizers with Diffusion Autoencoders.
CoRR, February, 2026

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds.
Int. J. Comput. Vis., January, 2026

FlexiVoice: Enabling Flexible Style Control in Zero-Shot TTS with Natural Language Instructions.
CoRR, January, 2026

Multi-Metric Preference Alignment for Generative Speech Restoration.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Innovative IoT device identification method based on residual-connected Capsule Network.
J. Cloud Comput., December, 2025

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness.
CoRR, November, 2025

TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling.
CoRR, August, 2025

Vevo2: Bridging Controllable Speech and Singing Voice Generation via Unified Prosody Learning.
CoRR, August, 2025

NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations.
CoRR, August, 2025

Metis: A Foundation Speech Generation Model with Masked Generative Pre-training.
CoRR, February, 2025

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation.
CoRR, January, 2025

Overview of the Amphion Toolkit (v0.2).
CoRR, January, 2025

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement.
CoRR, January, 2025

Research on Single Image Super-Resolution Enhancement Based on Coordinate Attention and Multi-Domain Loss Function.
Proceedings of the Machine Learning and Artificial Intelligence, 2025

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Noro: Noise-Robust One-Shot Voice Conversion with Hidden Speaker Representation Learning.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Trustworthy multi-phase liver tumor segmentation via evidence-based uncertainty.
Eng. Appl. Artif. Intell., 2024

Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities.
CoRR, 2024

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer.
CoRR, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds.
CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.
CoRR, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
CoRR, 2024

Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset For Large-Scale Speech Generation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Cross-Lingual Alzheimer's Disease Detection Based on Scale Criteria.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit.
CoRR, 2023

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Pipaset preview: A multimodal dataset dedicated to Chinese music instrument Pipa.
Dataset, April, 2022

PipaSet and TEAS: A Multimodal Dataset and Annotation Platform for Automatic Music Transcription and Expressive Analysis Dedicated to Chinese Traditional Plucked String Instrument Pipa.
IEEE Access, 2022

Automated testing of image captioning systems.
Proceedings of the ISSTA '22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18, 2022

Mining Assignment Submission Time to Detect At-Risk Students with Peer Information.
Proceedings of the 15th International Conference on Educational Data Mining, 2022

2019
Adversarial Training for Video Disentangled Representation.
Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

2018
An Attention-Based Approach for Single Image Super Resolution.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

2017
Calibration of a two-state pitch-wise HMM method for note segmentation in Automatic Music Transcription systems.
CoRR, 2017

Improving Note Segmentation in Automatic Piano Music Transcription Systems with a Two-State Pitch-Wise HMM Method.
Proceedings of the 18th International Society for Music Information Retrieval Conference, 2017


  Loading...