Yaoxun Xu

Orcid: 0009-0002-7063-7317

According to our database1, Yaoxun Xu authored at least 16 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
LeVo: High-Quality Song Generation with Multi-Preference Alignment.
CoRR, June, 2025

VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents.
CoRR, May, 2025

LeVo: High-Quality Song Generation with Multi-Preference Alignment.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

MuCodec: Ultra Low-Bitrate Music Codec for Music Generation.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

WAKE: Watermarking Audio with Key Enrichment.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
MuCodec: Ultra Low-Bitrate Music Codec.
CoRR, 2024

Advancing Multi-Talker ASR Performance With Large Language Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and Mixup.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

Comparing Discrete and Continuous Space LLMs for Speech Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Hydraformer: One Encoder for All Subsampling Rates.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

SECap: Speech Emotion Captioning with Large Language Model.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023


  Loading...