Xun Yang

Orcid: 0000-0003-0201-1638

Affiliations:
  • University of Science and Technology of China, Department of Electronic Engineering and Information Science, China


According to our database1, Xun Yang authored at least 131 papers between 2015 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Alleviating Confirmation Bias in Learning with Noisy Labels via Two-Network Collaboration.
ACM Trans. Intell. Syst. Technol., August, 2025

Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2025

Audio Does Matter: Importance-Aware Multi-Granularity Fusion for Video Moment Retrieval.
CoRR, August, 2025

MoCA: Identity-Preserving Text-to-Video Generation via Mixture of Cross Attention.
CoRR, August, 2025

Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning.
CoRR, August, 2025

Fuzzy Multivariate Variational Mode Decomposition With Applications in EEG Analysis.
IEEE Trans. Fuzzy Syst., June, 2025

Video Corpus Moment Retrieval With Query-Specific Context Learning and Progressive Localization.
IEEE Trans. Circuits Syst. Video Technol., June, 2025

Advancing crowd counting accuracy in diverse environments via comprehensive domain alignment strategies.
Multim. Syst., June, 2025

MUPA: Towards Multi-Path Agentic Reasoning for Grounded Video Question Answering.
CoRR, June, 2025

Dual-State Personalized Knowledge Tracing With Emotional Incorporation.
IEEE Trans. Knowl. Data Eng., May, 2025

Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph.
Int. J. Comput. Vis., May, 2025

Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective.
CoRR, May, 2025

PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation.
IEEE Trans. Comput. Soc. Syst., April, 2025

Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering.
CoRR, April, 2025

A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli.
CoRR, March, 2025

EgoBlind: Towards Egocentric Visual Assistance for the Blind People.
CoRR, March, 2025

Toward Complex-query Referring Image Segmentation: A Novel Benchmark.
ACM Trans. Multim. Comput. Commun. Appl., January, 2025

Tuning-Free Long Video Generation via Global-Local Collaborative Diffusion.
CoRR, January, 2025

TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Manipulation.
IEEE Trans. Multim., 2025

Repetitive Action Counting With Hybrid Temporal Relation Modeling.
IEEE Trans. Multim., 2025

Exploring Invariance Matters for Domain Generalization.
IEEE Trans. Image Process., 2025

Customized Transformer Adapter With Frequency Masking for Deepfake Detection.
IEEE Trans. Inf. Forensics Secur., 2025

Learning states enhanced Knowledge Tracing: Simulating the diversity in real-world learning process.
Expert Syst. Appl., 2025

AlphaFuse: Learn ID Embeddings for Sequential Recommendation in Null Space of Language Embeddings.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Improving Open-vocabulary Video Visual Relation Detection with Decomposed Prompt Learning and Relation Adjustment.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Disentangled Cascaded Graph Convolution Networks for Multi-Behavior Recommendation.
Trans. Recomm. Syst., December, 2024

FedGAMMA: Federated Learning With Global Sharpness-Aware Minimization.
IEEE Trans. Neural Networks Learn. Syst., December, 2024

Efficiently Gluing Pre-Trained Language and Vision Models for Image Captioning.
ACM Trans. Intell. Syst. Technol., December, 2024

Exploring and exploiting model uncertainty for robust visual question answering.
Multim. Syst., December, 2024

Mutual-weighted feature disentanglement for unsupervised domain adaptation.
Multim. Syst., December, 2024

Depth Matters: Spatial Proximity-Based Gaze Cone Generation for Gaze Following in Wild.
ACM Trans. Multim. Comput. Commun. Appl., November, 2024

Cross-Lingual Cross-Modal Retrieval With Noise-Robust Fine-Tuning.
IEEE Trans. Knowl. Data Eng., November, 2024

Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition.
Int. J. Comput. Vis., November, 2024

Mitigating Hidden Confounding Effects for Causal Recommendation.
IEEE Trans. Knowl. Data Eng., September, 2024

Dual-Path TokenLearner for Remote Photoplethysmography-Based Physiological Measurement With Facial Videos.
IEEE Trans. Comput. Soc. Syst., June, 2024

Graph Pooling Inference Network for Text-based VQA.
ACM Trans. Multim. Comput. Commun. Appl., April, 2024

Visual-linguistic-stylistic Triple Reward for Cross-lingual Image Captioning.
ACM Trans. Multim. Comput. Commun. Appl., April, 2024

FaSRnet: a feature and semantics refinement network for human pose estimation.
Frontiers Inf. Technol. Electron. Eng., March, 2024

Decoupled domain-specific and domain-conditional representation learning for cross-domain recommendation.
Inf. Process. Manag., March, 2024

Video Compressed Sensing Reconstruction via an Untrained Network with Low-Rank Regularization.
IEEE Trans. Multim., 2024

Frame-Padded Multiscale Transformer for Monocular 3D Human Pose Estimation.
IEEE Trans. Multim., 2024

Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames.
IEEE Trans. Multim., 2024

Emotional Video Captioning With Vision-Based Emotion Interpretation Network.
IEEE Trans. Image Process., 2024

Complex Power Quality Disturbance Recognition Research Based on Deep Complementary Fusion of 2-D Coding Transition.
IEEE Trans. Instrum. Meas., 2024

Equity in Unsupervised Domain Adaptation by Nuclear Norm Maximization.
IEEE Trans. Circuits Syst. Video Technol., 2024

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models.
CoRR, 2024

TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt.
CoRR, 2024

FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset.
CoRR, 2024

Grounding is All You Need? Dual Temporal Grounding for Video Dialog.
CoRR, 2024

Scene-Text Grounding for Text-Based Video Question Answering.
CoRR, 2024

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors.
CoRR, 2024

Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks.
CoRR, 2024

Towards Scale-Aware Full Surround Monodepth with Transformers.
CoRR, 2024

TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation.
CoRR, 2024

Gradually Vanishing Gap in Prototypical Network for Unsupervised Domain Adaptation.
CoRR, 2024

Personalized Forgetting Mechanism with Concept-Driven Knowledge Tracing.
CoRR, 2024

Robust video question answering via contrastive cross-modality representation learning.
Sci. China Inf. Sci., 2024

Temporal Sentence Grounding with Relevance Feedback in Videos.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

AMTN: Attention-Enhanced Multimodal Temporal Network for Humor Detection.
Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor, 2024

Informative Point cloud Dataset Extraction for Classification via Gradient-based Points Moving.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Reverse2Complete: Unpaired Multimodal Point Cloud Completion via Guided Diffusion.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Maskable Retentive Network for Video Moment Retrieval.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

FedCAFE: Federated Cross-Modal Hashing with Adaptive Feature Enhancement.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Advancing Prompt Learning through an External Layer.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Dual-stream Feature Augmentation for Domain Generalization.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Enhancing One-Shot Federated Learning Through Data and Ensemble Co-Boosting.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Rethinking Human Motion Prediction with Symplectic Integral.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Boosting Neural Cognitive Diagnosis with Student's Affective State Modeling.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Causality-Inspired Invariant Representation Learning for Text-Based Person Retrieval.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

KPA-Tracker: Towards Robust and Real-Time Category-Level Articulated Object 6D Pose Tracking.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Transformer-Based Visual Grounding with Cross-Modality Interaction.
ACM Trans. Multim. Comput. Commun. Appl., November, 2023

Progressive Localization Networks for Language-Based Moment Localization.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA.
IEEE Trans. Image Process., 2023

Finding and Editing Multi-Modal Neurons in Pre-Trained Transformer.
CoRR, 2023

Towards Complex-query Referring Image Segmentation: A Novel Benchmark.
CoRR, 2023

From Region to Patch: Attribute-Aware Foreground-Background Contrastive Learning for Fine-Grained Fashion Retrieval.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

InstanT: Semi-supervised Learning with Instance-dependent Thresholds.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Self-Distillation Dual-Memory Online Hashing with Hash Centers for Streaming Data Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Emotion-Prior Awareness Network for Emotional Video Captioning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Semantics-Enriched Cross-Modal Alignment for Complex-Query Video Moment Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Category-Level Articulated Object 9D Pose Estimation via Reinforcement Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Modeling Multi-Relational Connectivity for Personalized Fashion Matching.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Learning Style-Invariant Robust Representation for Generalizable Visual Instance Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Disentangled Representation Learning with Causality for Unsupervised Domain Adaptation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Redundancy-aware Transformer for Video Question Answering.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Domain Generalized Stereo Matching via Hierarchical Visual Transformation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Self-Supervised Graph Learning for Long-Tailed Cognitive Diagnosis.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Introduction to the Special Section on Learning Representations, Similarity, and Associations in Dynamic Multimedia Environments.
ACM Trans. Multim. Comput. Commun. Appl., 2022

Topic-Guided Conversational Recommender in Multiple Domains.
IEEE Trans. Knowl. Data Eng., 2022

Video Moment Retrieval With Cross-Modal Neural Architecture Search.
IEEE Trans. Image Process., 2022

Dual Encoding for Video Retrieval by Text.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Partially Relevant Video Retrieval.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Modeling Field-Level Factor Interactions for Fashion Recommendation.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

2021
Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning.
IEEE Trans. Image Process., 2021

Semantic manifold modularization-based ranking for image recommendation.
Pattern Recognit., 2021

Progressive Localization Networks for Language-based Moment Localization.
CoRR, 2021

Deconfounded Video Moment Retrieval with Causal Intervention.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Selective Dependency Aggregation for Action Classification.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

ADVM'21: 1st International Workshop on Adversarial Learning for Multimedia.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Interventional Video Relation Detection.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Reproducibility Companion Paper: Knowledge Enhanced Neural Fashion Trend Forecasting.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

2020
Deep Neighborhood Component Analysis for Visual Similarity Modeling.
ACM Trans. Intell. Syst. Technol., 2020

Introduction to the Special Section on Contextual Object Analysis in Complex Scenes.
IEEE Trans. Circuits Syst. Video Technol., 2020

Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Weakly-Supervised Video Object Grounding by Exploring Spatio-Temporal Contexts.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Knowledge Enhanced Neural Fashion Trend Forecasting.
Proceedings of the 2020 on International Conference on Multimedia Retrieval, 2020

Visual Relation Grounding in Videos.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning to Match on Graph for Fashion Compatibility Modeling.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Person Reidentification via Structural Deep Metric Learning.
IEEE Trans. Neural Networks Learn. Syst., 2019

Deep Conversational Recommender in Travel.
CoRR, 2019

Interpretable Fashion Matching with Rich Attributes.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Learning Using Privileged Information for Food Recognition.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Who, Where, and What to Wear?: Extracting Fashion Knowledge from Social Media.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Annotating Objects and Relations in User-Generated Videos.
Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

Cross-modal Collaborative Manifold Propagation for Image Recommendation.
Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

Progressive Image Enhancement under Aesthetic Guidance.
Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

Multiple Hypothesis Video Relation Detection.
Proceedings of the Fifth IEEE International Conference on Multimedia Big Data, 2019

TransNFCM: Translation-Based Neural Fashion Compatibility Modeling.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Person Re-Identification With Metric Learning Using Privileged Information.
IEEE Trans. Image Process., 2018

2017
Saliency Detection on Light Field: A Multi-Cue Approach.
ACM Trans. Multim. Comput. Commun. Appl., 2017

Enhancing Person Re-identification in a Self-Trained Subspace.
ACM Trans. Multim. Comput. Commun. Appl., 2017

2016
An Efficient Tracking System by Orthogonalized Templates.
IEEE Trans. Ind. Electron., 2016

Empirical Risk Minimization for Metric Learning Using Privileged Information.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

2015
Robust visual tracking via multi-graph ranking.
Neurocomputing, 2015


  Loading...