We stand with Ukraine

We stand with Ukraine

Yuancheng Wang

Orcid: 0000-0003-2382-3424

According to our database¹, Yuancheng Wang authored at least 40 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora.

[DOI]

,

,

,

,

,

,

,

CoRR, April, 2026

Scaling Speech Tokenizers with Diffusion Autoencoders.

[DOI]

,

,

,

Arthur Hinsvark

,

,

,

,

,

,

,

,

CoRR, February, 2026

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds.

[DOI]

,

,

,

,

,

,

,

Int. J. Comput. Vis., January, 2026

FlexiVoice: Enabling Flexible Style Control in Zero-Shot TTS with Natural Language Instructions.

[DOI]

,

,

,

,

,

CoRR, January, 2026

Multi-Metric Preference Alignment for Generative Speech Restoration.

[DOI]

,

,

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Innovative IoT device identification method based on residual-connected Capsule Network.

[DOI]

,

,

J. Cloud Comput., December, 2025

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, November, 2025

Vevo2: Bridging Controllable Speech and Singing Voice Generation via Unified Prosody Learning.

[DOI]

,

,

,

,

,

,

,

CoRR, August, 2025

NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations.

[DOI]

,

,

,

,

,

,

,

CoRR, August, 2025

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation.

[DOI]

,

Zengqiang Shang

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, January, 2025

Overview of the Amphion Toolkit (v0.2).

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, January, 2025

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement.

[DOI]

,

,

,

,

,

,

,

CoRR, January, 2025

Metis: A Foundation Speech Generation Model with Masked Generative Pre-training.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Research on Single Image Super-Resolution Enhancement Based on Coordinate Attention and Multi-Domain Loss Function.

[DOI]

,

,

,

,

Proceedings of the Machine Learning and Artificial Intelligence, 2025

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Noro: Noise-Robust One-Shot Voice Conversion with Hidden Speaker Representation Learning.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment.

[DOI]

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Trustworthy multi-phase liver tumor segmentation via evidence-based uncertainty.

[DOI]

,

,

,

,

,

,

,

Eng. Appl. Artif. Intell., 2024

Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds.

[DOI]

,

,

,

,

,

,

CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.

[DOI]

,

,

,

,

,

,

Shinnosuke Takamichi

,

Hiroshi Saruwatari

,

,

,

CoRR, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset For Large-Scale Speech Generation.

[DOI]

,

Zengqiang Shang

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Cross-Lingual Alzheimer's Disease Detection Based on Scale Criteria.

[DOI]

,

,

,

,

,

,

Wei-Qiang Zhang

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2023

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

Pipaset preview: A multimodal dataset dedicated to Chinese music instrument Pipa.

[DOI]

Dataset, April, 2022

PipaSet and TEAS: A Multimodal Dataset and Annotation Platform for Automatic Music Transcription and Expressive Analysis Dedicated to Chinese Traditional Plucked String Instrument Pipa.

[DOI]

,

,

,

,

,

IEEE Access, 2022

Automated testing of image captioning systems.

[DOI]

,

,

,

,

,

Proceedings of the ISSTA '22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18, 2022

Mining Assignment Submission Time to Detect At-Risk Students with Peer Information.

[DOI]

,

,

Proceedings of the 15th International Conference on Educational Data Mining, 2022

2019

Adversarial Training for Video Disentangled Representation.

[DOI]

,

,

,

,

,

,

Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

2018

An Attention-Based Approach for Single Image Super Resolution.

[DOI]

,

,

,

,

,

,

Proceedings of the 24th International Conference on Pattern Recognition, 2018

2017

Calibration of a two-state pitch-wise HMM method for note segmentation in Automatic Music Transcription systems.

[DOI]

,

,

,

,

CoRR, 2017

Improving Note Segmentation in Automatic Piano Music Transcription Systems with a Two-State Pitch-Wise HMM Method.

[DOI]

,

,

,

,

Proceedings of the 18th International Society for Music Information Retrieval Conference, 2017

Loading...