Xiaoda Yang
Orcid: 0009-0002-7297-4536
According to our database1,
Xiaoda Yang
authored at least 21 papers
between 2024 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation.
CoRR, June, 2025
CoRR, June, 2025
EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model.
CoRR, April, 2025
CoRR, March, 2025
Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis.
CoRR, February, 2025
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios.
CoRR, January, 2025
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration.
Proceedings of the ACM on Web Conference 2025, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the 31st International Conference on Computational Linguistics, 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling.
CoRR, 2024
SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024