Yaoxun Xu

Orcid: 0009-0002-7063-7317

According to our database¹, Yaoxun Xu authored at least 14 papers between 2023 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

LeVo: High-Quality Song Generation with Multi-Preference Alignment.

[BibT_eX]

[DOI]

CoRR, June, 2025

VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents.

[BibT_eX]

[DOI]

CoRR, May, 2025

WAKE: Watermarking Audio with Key Enrichment.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

MuCodec: Ultra Low-Bitrate Music Codec.

[BibT_eX]

[DOI]

CoRR, 2024

Advancing Multi-Talker ASR Performance With Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and Mixup.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

Comparing Discrete and Continuous Space LLMs for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Hydraformer: One Encoder for All Subsampling Rates.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

SECap: Speech Emotion Captioning with Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Yaoxun Xu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...