Yuping Wang

Affiliations:
  • ByteDance, Shanghai, China


According to our database1, Yuping Wang authored at least 39 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Sounding that Object: Interactive Object-Aware Image to Audio Generation.
CoRR, June, 2025

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation.
CoRR, June, 2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix.
CoRR, May, 2025

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation.
CoRR, February, 2025

Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model.
CoRR, January, 2025

Sound-VECaps: Improving Audio Generation with Visually Enhanced Captions.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Towards Reliable Large Audio Language Model.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Language Model Can Listen While Speaking.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Multi-Level Temporal-Channel Speaker Retrieval for Zero-Shot Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

StreamVoice+: Evolving Into End-to-End Streaming Zero-Shot Voice Conversion.
IEEE Signal Process. Lett., 2024

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation.
CoRR, 2024

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition.
CoRR, 2024

Improving Audio Generation with Visual Enhanced Caption.
CoRR, 2024

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models.
CoRR, 2024

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing.
CoRR, 2024

PolyVoice: Language Models for Speech to Speech Translation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
MSM-VC: High-Fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-Scale Style Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

LM-VC: Zero-Shot Voice Conversion via Speech Generation Based on Language Models.
IEEE Signal Process. Lett., 2023

PolyVoice: Language Models for Speech to Speech Translation.
CoRR, 2023

a unified front-end framework for english text-to-speech synthesis.
CoRR, 2023

Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion.
CoRR, 2023

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing.
CoRR, 2023

Efficient Neural Music Generation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Delivering Speaking Style in Low-Resource Voice Conversion with Multi-Factor Constraints.
Proceedings of the IEEE International Conference on Acoustics, 2023

Streaming Voice Conversion via Intermediate Bottleneck Features and Non-Streaming Teacher Guidance.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech.
CoRR, 2022

Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2022

Cloning One's Voice Using Very Limited Data in the Wild.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Neural Dubber: Dubbing for Silent Videos According to Scripts.
CoRR, 2021

Neural Dubber: Dubbing for Videos According to Scripts.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020
Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement.
CoRR, 2020

Xiaomingbot: A Multilingual Robot News Reporter.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020


  Loading...