Yan Xia

Orcid: 0000-0003-4631-741X

Affiliations:

Zhejiang University, Hangzhou, China

According to our database¹, Yan Xia authored at least 26 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

DSI-Bench: A Benchmark for Dynamic Spatial Intelligence.

[BibT_eX]

[DOI]

CoRR, October, 2025

RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation.

[BibT_eX]

[DOI]

CoRR, September, 2025

Open-set Cross Modal Generalization via Multimodal Unified Representation.

[BibT_eX]

[DOI]

CoRR, July, 2025

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations.

[BibT_eX]

[DOI]

CoRR, July, 2025

APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization.

[BibT_eX]

[DOI]

CoRR, June, 2025

Continual Cross-Modal Generalization.

[BibT_eX]

[DOI]

CoRR, April, 2025

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation.

[BibT_eX]

[DOI]

CoRR, March, 2025

EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration.

[BibT_eX]

[DOI]

Proceedings of the ACM on Web Conference 2025, 2025

Overcoming both Domain Shift and Label Shift for Referring Video Segmentation.

[BibT_eX]

[DOI]

Hai Huang

Sashuai Zhou

Yan Xia

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning.

[BibT_eX]

[DOI]

Sashuai Zhou

Yan Xia

Hai Huang

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Semantic Residual for Multimodal Unified Discrete Representation.

[BibT_eX]

[DOI]

Hai Huang

Shulei Wang

Yan Xia

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Multimodal Unified Representations for Cross Modal Generalization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Multi-Granularity Relational Attention Network for Audio-Visual Question Answering.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., August, 2024

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.

[BibT_eX]

[DOI]

CoRR, 2024

Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding.

[BibT_eX]

[DOI]

CoRR, 2023

Achieving Cross Modal Generalization with Multimodal Unified Representation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Scene-robust Natural Language Video Localization via Learning Domain-invariant Representations.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Video-Guided Curriculum Learning for Spoken Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Cross-modal Background Suppression for Audio-Visual Event Localization.

[BibT_eX]

[DOI]

Yan Xia

Zhou Zhao

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Yan Xia

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...