Yan Xia

Orcid: 0000-0003-4631-741X

Affiliations:
  • Zhejiang University, Hangzhou, China


According to our database1, Yan Xia authored at least 24 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Open-set Cross Modal Generalization via Multimodal Unified Representation.
CoRR, July, 2025

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations.
CoRR, July, 2025

APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization.
CoRR, June, 2025

Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval.
CoRR, June, 2025

Continual Cross-Modal Generalization.
CoRR, April, 2025

Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning.
CoRR, March, 2025

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation.
CoRR, March, 2025

EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration.
Proceedings of the ACM on Web Conference 2025, 2025

Overcoming both Domain Shift and Label Shift for Referring Video Segmentation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Semantic Residual for Multimodal Unified Discrete Representation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Multimodal Unified Representations for Cross Modal Generalization.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Multi-Granularity Relational Attention Network for Audio-Visual Question Answering.
IEEE Trans. Circuits Syst. Video Technol., August, 2024

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling.
CoRR, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.
CoRR, 2024

Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment.
CoRR, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding.
CoRR, 2023

Achieving Cross Modal Generalization with Multimodal Unified Representation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Scene-robust Natural Language Video Localization via Learning Domain-invariant Representations.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Video-Guided Curriculum Learning for Spoken Video Grounding.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Cross-modal Background Suppression for Audio-Visual Event Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022


  Loading...