Xiaoda Yang
Orcid: 0009-0002-7297-4536
According to our database1,
Xiaoda Yang authored at least 36 papers
between 2024 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning.
CoRR, April, 2026
A Progressive Training Strategy for Vision-Language Models to Counteract Spatio-Temporal Hallucinations in Embodied Reasoning.
CoRR, April, 2026
ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks.
CoRR, April, 2026
SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation.
CoRR, March, 2026
SpatialLogic-Bench: A Diagnostic Benchmark for Task-Oriented Spatiotemporal Reasoning.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
2025
CoRR, November, 2025
VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework.
CoRR, October, 2025
CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation.
CoRR, June, 2025
CoRR, March, 2025
Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis.
CoRR, February, 2025
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios.
CoRR, January, 2025
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration.
Proceedings of the ACM on Web Conference 2025, 2025
EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025
Choose Your Expert: Uncertainty-Guided Expert Selection for Continual Deepfake Detection.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025
Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025
Proceedings of the 31st International Conference on Computational Linguistics, 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025
2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling.
CoRR, 2024
SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024