Xiaoda Yang

Orcid: 0009-0002-7297-4536

According to our database¹, Xiaoda Yang authored at least 38 papers between 2024 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

DocRetriever: A Plug-and-Play Framework for Multimodal Document Retrieval with Comprehensive Benchmark.

[BibT_eX]

[DOI]

CoRR, May, 2026

TMD-Bench: A Multi-Level Evaluation Paradigm for Music-Dance Co-Generation.

[BibT_eX]

[DOI]

CoRR, May, 2026

From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning.

[BibT_eX]

[DOI]

CoRR, April, 2026

A Progressive Training Strategy for Vision-Language Models to Counteract Spatio-Temporal Hallucinations in Embodied Reasoning.

[BibT_eX]

[DOI]

CoRR, April, 2026

ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks.

[BibT_eX]

[DOI]

CoRR, April, 2026

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, March, 2026

VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

SpatialLogic-Bench: A Diagnostic Benchmark for Task-Oriented Spatiotemporal Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer.

[BibT_eX]

[DOI]

CoRR, November, 2025

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation.

[BibT_eX]

[DOI]

CoRR, June, 2025

OmniCam: Unified Multimodal Video Generation via Camera Control.

[BibT_eX]

[DOI]

CoRR, April, 2025

Astrea: A MOE-based Visual Understanding Model with Progressive Alignment.

[BibT_eX]

[DOI]

CoRR, March, 2025

Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, February, 2025

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios.

[BibT_eX]

[DOI]

CoRR, January, 2025

EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration.

[BibT_eX]

[DOI]

Proceedings of the ACM on Web Conference 2025, 2025

EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Choose Your Expert: Uncertainty-Guided Expert Selection for Continual Deepfake Detection.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Multimodal Conditional Retrieval with High Controllability.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

MelRe: Vision-Based Mel-Spectrogram Restoration.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

GTA: Towards Generative Text-To-Audio Retrieval via Multi-Scale Tokenizer.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PACHAT: Persona-Aware Speech Assistant for Multi-party Dialogue.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

BrainLoc: Brain Signal-Based Object Detection with Multi-modal Alignment.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

WavChat: A Survey of Spoken Dialogue Models.

[BibT_eX]

[DOI]

CoRR, 2024

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AudioVSR: Enhancing Video Speech Recognition with Audio Data.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Xiaoda Yang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...