Yuping Wang

Affiliations:

ByteDance, Shanghai, China

According to our database¹, Yuping Wang authored at least 41 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Heptapod: Language Modeling on Visual Signals.

[BibT_eX]

[DOI]

CoRR, October, 2025

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix.

[BibT_eX]

[DOI]

CoRR, May, 2025

Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model.

[BibT_eX]

[DOI]

CoRR, January, 2025

Sounding that Object: Interactive Object-Aware Image to Audio Generation.

[BibT_eX]

[DOI]

Gopala Anumanchipalli

Yuxuan Wang

Proceedings of the Forty-second International Conference on Machine Learning, 2025

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Sound-VECaps: Improving Audio Generation with Visually Enhanced Captions.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Towards Reliable Large Audio Language Model.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Language Model Can Listen While Speaking.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Multi-Level Temporal-Channel Speaker Retrieval for Zero-Shot Voice Conversion.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

StreamVoice+: Evolving Into End-to-End Streaming Zero-Shot Voice Conversion.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2024

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Audio Generation with Visual Enhanced Caption.

[BibT_eX]

[DOI]

CoRR, 2024

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models.

[BibT_eX]

[DOI]

CoRR, 2024

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing.

[BibT_eX]

[DOI]

CoRR, 2024

PolyVoice: Language Models for Speech to Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

MSM-VC: High-Fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-Scale Style Modeling.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

LM-VC: Zero-Shot Voice Conversion via Speech Generation Based on Language Models.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2023

PolyVoice: Language Models for Speech to Speech Translation.

[BibT_eX]

[DOI]

CoRR, 2023

a unified front-end framework for english text-to-speech synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2023

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing.

[BibT_eX]

[DOI]

CoRR, 2023

Efficient Neural Music Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Delivering Speaking Style in Low-Resource Voice Conversion with Multi-Factor Constraints.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Streaming Voice Conversion via Intermediate Bottleneck Features and Non-Streaming Teacher Guidance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech.

[BibT_eX]

[DOI]

CoRR, 2022

Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Cloning One's Voice Using Very Limited Data in the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Neural Dubber: Dubbing for Silent Videos According to Scripts.

[BibT_eX]

[DOI]

CoRR, 2021

Neural Dubber: Dubbing for Videos According to Scripts.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020

Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2020

Xiaomingbot: A Multilingual Robot News Reporter.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020

Yuping Wang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...