Jitao Sang

ACM Trans. Multim. Comput. Commun. Appl., March, 2026

HulluEdit: Single-Pass Evidence-Consistent Subspace Editing for Mitigating Hallucinations in Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

ITDR: An Instruction Tuning Dataset for Enhancing Large Language Models in Recommendations.

[BibT_eX]

[DOI]

Zekun Liu

Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, 2026

Membership Inference Attack Against Large Language Model-Based Recommendation Systems: A New Distillation-Based Paradigm.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Look before Transcription: End-to-End SlideASR with Visually-Anchored Policy Optimization.

[BibT_eX]

[DOI]

CoRR, October, 2025

Towards Robust Recommendation: A Review and an Adversarial Robustness Evaluation Library.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., September, 2025

Backdoor for Debias: Mitigating Model Bias With Backdoor Attack-Based Artificial Bias.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., August, 2025

Debiased Prompt Tuning in Vision-Language Model without Annotations.

[BibT_eX]

[DOI]

CoRR, March, 2025

Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration.

[BibT_eX]

[DOI]

CoRR, February, 2025

MF-CLIP: Leveraging CLIP as Surrogate Models for No-Box Adversarial Attacks.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2025

A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive and Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers.

[BibT_eX]

[DOI]

Shangxi Wu

Jinlin Xiao

IEEE Trans. Inf. Forensics Secur., 2025

Prescribing the right remedy: Mitigating hallucinations in large vision-language models via targeted instruction tuning.

[BibT_eX]

[DOI]

Inf. Sci., 2025

Exploring the Privacy Protection Capabilities of Chinese Large Language Models.

[BibT_eX]

[DOI]

Yuqi Yang

IEEE Multim., 2025

A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems.

[BibT_eX]

[DOI]

Lixi Zhu

Proceedings of the ACM on Web Conference 2025, 2025

SILLM4Rec: Self-Improving with Chain of Thought Enhanced Preference Optimization for Multimodal Recommendation.

[BibT_eX]

[DOI]

Proceedings of the 7th ACM International Conference on Multimedia in Asia, 2025

Debiased Prompt Tuning for Vision-Language Models without Annotations.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2025

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models.

[BibT_eX]

[DOI]

Yahan Tu

Rui Hu

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Adaptive Adversarial Logits Pairing.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2024

TIF: Threshold Interception and Fusion for Compact and Fine-Grained Visual Attribution.

[BibT_eX]

[DOI]

Guanhua Zheng

IEEE Trans. Multim., 2024

Debiasing Vison-Language Models with Text-Only Training.

[BibT_eX]

[DOI]

CoRR, 2024

AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions.

[BibT_eX]

[DOI]

CoRR, 2024

DenoiseReID: Denoising Model for Representation Learning of Person Re-Identification.

[BibT_eX]

[DOI]

CoRR, 2024

Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model.

[BibT_eX]

[DOI]

CoRR, 2024

Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning.

[BibT_eX]

[DOI]

Rui Hu

Yahan Tu

CoRR, 2024

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception.

[BibT_eX]

[DOI]

CoRR, 2024

How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation.

[BibT_eX]

[DOI]

Lixi Zhu

Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Staying in the Cat-and-Mouse Game: Towards Black-box Adversarial Example Detection.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval, 2024

Poisoning for Debiasing: Fair Recognition via Eliminating Bias Uncovered in Data Poisoning.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Adversarial Prompt Tuning for Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

Low-mid adversarial perturbation against unauthorized face recognition system.

[BibT_eX]

[DOI]

Inf. Sci., November, 2023

Knowledge Graph-Enhanced Sampling for Conversational Recommendation System.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., October, 2023

Attention, Please! Adversarial Defense via Activation Rectification and Preservation.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2023

Towards a multimodal human activity dataset for healthcare.

[BibT_eX]

[DOI]

Multim. Syst., 2023

Debiasing backdoor attack: A benign application of backdoor attack in eliminating data bias.

[BibT_eX]

[DOI]

Inf. Sci., 2023

Language-assisted Vision Model Debugger: A Sample-Free Approach to Finding Bugs.

[BibT_eX]

[DOI]

CoRR, 2023

An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation.

[BibT_eX]

[DOI]

CoRR, 2023

Evaluation and Analysis of Hallucination in Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility.

[BibT_eX]

[DOI]

CoRR, 2023

Introducing Foundation Models as Surrogate Models: Advancing Towards More Practical Adversarial Attacks.

[BibT_eX]

[DOI]

CoRR, 2023

Backdoor for Debias: Mitigating Model Bias with Backdoor Attack-based Artificial Bias.

[BibT_eX]

[DOI]

CoRR, 2023

Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo Chamber.

[BibT_eX]

[DOI]

Rui Hu

Yahan Tu

Proceedings of the 31st ACM International Conference on Multimedia, 2023

From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Improved Visual Fine-tuning with Natural Language Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Alleviating the Object Bias in Prompt Tuning-based Factual Knowledge Extraction.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

ImageNet Pre-training Also Transfers Non-robustness.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Image-Based Personality Questionnaire Design.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2022

Learning to Learn a Cold-start Sequential Recommender.

[BibT_eX]

[DOI]

ACM Trans. Inf. Syst., 2022

Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment.

[BibT_eX]

[DOI]

CoRR, 2022

Fair Visual Recognition via Intervention with Proxy Features.

[BibT_eX]

[DOI]

Junyang Wang

CoRR, 2022

FairCLIP: Social Bias Elimination based on Attribute Prototype Learning and Representation Neutralization.

[BibT_eX]

[DOI]

Junyang Wang

CoRR, 2022

JPEG Compression-Resistant Low-Mid Adversarial Perturbation against Unauthorized Face Recognition System.

[BibT_eX]

[DOI]

Jiaming Zhang

Qi Yi

CoRR, 2022

Debiasing Backdoor Attack: A Benign Application of Backdoor Attack in Eliminating Data Bias.

[BibT_eX]

[DOI]

CoRR, 2022

Towards Adversarial Attack on Vision-Language Pre-training Models.

[BibT_eX]

[DOI]

Jiaming Zhang

Qi Yi

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models.

[BibT_eX]

[DOI]

Junyang Wang

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Benign Adversarial Attack: Tricking Models for Goodness.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Non-generative Generalized Zero-shot Learning via Task-correlated Disentanglement and Controllable Samples Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Knowledge-driven Egocentric Multimodal Activity Recognition.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2021

Robust CAPTCHAs Towards Malicious OCR.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2021

Metadata Connector: Exploiting Hashtag and Tag for Cross-OSN Event Search.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2021

Knowledge Graph-enhanced Sampling for Conversational Recommender System.

[BibT_eX]

[DOI]

CoRR, 2021

Benign Adversarial Attack: Tricking Algorithm for Goodness.

[BibT_eX]

[DOI]

CoRR, 2021

Pre-training also Transfers Non-Robustness.

[BibT_eX]

[DOI]

CoRR, 2021

Trustworthy Multimedia Analysis.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020

Meta-path Augmented Sequential Recommendation with Contextual Co-attention Network.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2020

Locality-constrained discrete graph hashing.

[BibT_eX]

[DOI]

Wenjie Ying

Jian Yu

Neurocomputing, 2020

Beyond Literal Visual Modeling: Understanding Image Metaphor Based on Literal-Implied Concept Mapping.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 26th International Conference, 2020

Adversarial Privacy-preserving Filter.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Towards Accuracy-Fairness Paradox: Adversarial Example-based Data Augmentation for Visual Debiasing.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

2019

A Generalization Theory based on Independent and Task-Identically Distributed Assumption.

[BibT_eX]

[DOI]

CoRR, 2019

blessing in disguise: Designing Robust Turing Test by Employing Algorithm Unrobustness.

[BibT_eX]

[DOI]

CoRR, 2019

Multimodal Attribute and Feature Embedding for Activity Recognition.

[BibT_eX]

[DOI]

Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Multi-source User Attribute Inference based on Hierarchical Auto-encoder.

[BibT_eX]

[DOI]

Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Comprehensive Event Storyline Generation from Microblogs.

[BibT_eX]

[DOI]

Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Explainable Interaction-driven User Modeling over Knowledge Graph for Sequential Recommendation.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

2018

Understanding Dynamic Cross-OSN Associations for Cold-Start Recommendation.

[BibT_eX]

[DOI]

Ming Yan

IEEE Trans. Multim., 2018

Bundled Local Features for Image Representation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2018

Social Relationship Labeling Based on Multimodal Behaviors and Social Interactions.

[BibT_eX]

[DOI]

IEEE Multim., 2018

Attention, Please! Adversarial Defense via Attention Rectification and Preservation.

[BibT_eX]

[DOI]

CoRR, 2018

CSAN: Contextual Self-Attention Network for User Sequential Recommendation.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017

Who Are Your "Real" Friends: Analyzing and Distinguishing Between Offline and Online Friendships From Social Multimedia Data.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2017

A Demo for Image-Based Personality Test.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 23rd International Conference, 2017

Towards SMP Challenge: Stacking of Diverse Models for Social Image Popularity Prediction.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

Hashtag-centric Immersive Search on Social Media.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

2016

A Unified Video Recommendation by Cross-Network User Modeling.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2016

Folksonomy-Based Visual Ontology Construction and Its Applications.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2016

基于关联规则挖掘的跨网络知识关联及协同应用 (Association Rules Mining Based Cross-network Knowledge Association and Collaborative Applications).

[BibT_eX]

[DOI]

计算机科学, 2016

2015

Learning Feature Hierarchies: A Layer-Wise Tag-Embedded Approach.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2015

YouTube Video Promotion by Cross-Network Association: @Britney to Advertise Gangnam Style.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2015

Word-of-Mouth Understanding: Entity-Centric Multimodal Aspect-Opinion Mining in Social Media.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2015

Relational User Attribute Inference in Social Media.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2015

Activity Sensor: Check-In Usage Mining for Local Recommendation.

[BibT_eX]

[DOI]

Tao Mei

ACM Trans. Intell. Syst. Technol., 2015

Unified YouTube Video Recommendation via Cross-network Collaboration.

[BibT_eX]

[DOI]

Ming Yan

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

2014

Twitter is Faster: Personalized Time-Aware Video Recommendation from Twitter to YouTube.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2014

A Unified Framework of Latent Feature Learning in Social Media.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2014

Mining Cross-network Association for YouTube Video Promotion.

[BibT_eX]

[DOI]

Ming Yan

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

2013

Interaction Design for Mobile Visual Search.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2013

Latent feature learning in social media network.

[BibT_eX]

[DOI]

Proceedings of the ACM Multimedia Conference, 2013

Tag-aware image classification via Nested Deep Belief nets.

[BibT_eX]

[DOI]

Zhaoquan Yuan