Wei Ji

Orcid: 0000-0002-8106-9768

Affiliations:

Nanjing University, School of Intelligence Science and Technology, Nanjing, China (since 2024)
National University of Singapore, School of Computing, Singapore
Zhejiang University, College of Computer Science, Hangzhou, China (PhD 2020)

According to our database¹, Wei Ji authored at least 104 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Immuno-VLM: Immunizing Large Vision-Language Models via Generative Semantic Antibodies for Open-World Trustworthiness.

[BibT_eX]

[DOI]

Xiang Fang

Wanlong Fang

Wei Ji

CoRR, May, 2026

RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation.

[BibT_eX]

[DOI]

CoRR, May, 2026

Prompt-Aware Adapter: Learning Adaptive Visual Tokens for Multimodal Large Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Artif. Intell., March, 2026

UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark.

[BibT_eX]

[DOI]

CoRR, March, 2026

Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing.

[BibT_eX]

[DOI]

CoRR, January, 2026

Evolving Generalist Virtual Agents with Generative and Associative Memory.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning.

[BibT_eX]

[DOI]

CoRR, December, 2025

Visuo-Tactile Class-Incremental Learning.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., November, 2025

Introduction to the Special Issue on Deep Multimodal Generation and Retrieval.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., November, 2025

Transformer-Empowered Invariant Grounding for Video Question Answering.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey.

[BibT_eX]

[DOI]

CoRR, July, 2025

What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities.

[BibT_eX]

[DOI]

CoRR, June, 2025

DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving Data Generation.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., March, 2025

TAIL: Text-Audio Incremental Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2025

Toward Complex-query Referring Image Segmentation: A Novel Benchmark.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., January, 2025

WisdomBot: Tuning Large Language Models with Artificial Intelligence Knowledge.

[BibT_eX]

[DOI]

CoRR, January, 2025

HOVER: Hyperbolic Video-Text Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2025

The 1<sup>st</sup> NIP@IR Workshop on New Interaction Paradigms for Information Retrieval in the Era of Generative AI.

[BibT_eX]

[DOI]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

On Efficiency-Effectiveness Trade-off of Diffusion-based Recommenders.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Counterfactual Evolution of Multimodal Datasets via Visual Programming.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Turing Patterns for Multimedia: Reaction-Diffusion Multi-Modal Fusion for Language-Guided Video Moment Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Coarse-to-Fine Cross-Modality Generation for Enhancing Vehicle Re-Identification with High-Fidelity Synthetic Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Generalized Video Moment Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Few-Shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

In Defense of Clip-Based Video Relation Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

Grounding is All You Need? Dual Temporal Grounding for Video Dialog.

[BibT_eX]

[DOI]

CoRR, 2024

DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2024

Described Spatial-Temporal Video Detection.

[BibT_eX]

[DOI]

CoRR, 2024

Backpropogation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration.

[BibT_eX]

[DOI]

CoRR, 2024

Weakly Supervised Video Moment Retrieval via Location-irrelevant Proposal Learning.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

I3: Intent-Introspective Retrieval Conditioned on Instructions.

[BibT_eX]

[DOI]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

NLPCC2024 Shared Task 3 Technical Report.

[BibT_eX]

[DOI]

Yingfei Sun

Wei Ji

Proceedings of the Natural Language Processing and Chinese Computing, 2024

The 2nd International Workshop on Deep Multi-modal Generation and Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval, 2024

Hierarchical Debiasing and Noisy Correction for Cross-domain Video Tube Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Semantic Alignment for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SpeechEE: A Novel Benchmark for Speech Event Extraction.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

NExT-Chat: An LMM for Chat, Detection and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

NExT-GPT: Any-to-Any Multimodal LLM.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Domain-Wise Invariant Learning for Panoptic Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Mrtnet: Multi-Resolution Temporal Network for Video Sentence Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Dysen-VDM: Empowering Dynamics-Aware Text-to-Video Diffusion with LLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Panoptic Scene Graph Generation with Semantics-Prototype Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatially Relation Matching.

[BibT_eX]

[DOI]

CoRR, 2023

NExT-Chat: An LMM for Chat, Detection and Segmentation.

[BibT_eX]

[DOI]

Ao Zhang

Wei Ji

Tat-Seng Chua

CoRR, 2023

Towards Complex-query Referring Image Segmentation: A Novel Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval.

[BibT_eX]

[DOI]

CoRR, 2023

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2023

Transfer Visual Prompt Generator across LLMs.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-queue Momentum Contrast for Microvideo-Product Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023

VPGTrans: Transfer Visual Prompt Generator across LLMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Biased-Predicate Annotation Identification via Unbiased Visual Predicate Representation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Learning Style-Invariant Robust Representation for Generalizable Visual Instance Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Deep Multimodal Learning for Information Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Partial Annotation-based Video Moment Retrieval via Iterative Learning.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ART: rule bAsed futuRe-inference deducTion.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

WINNER: Weakly-supervised hIerarchical decompositioN and aligNment for spatio-tEmporal video gRounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Generating Visual Spatial Description via Holistic 3D Scene Understanding.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Two Heads Are Better Than One: Improving Fake News Video Detection by Correlating with Neighbors.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Video-Audio Domain Generalization via Confounder Disentanglement.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Conditional Hyper-Network for Blind Super-Resolution With Multiple Degradations.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey.

[BibT_eX]

[DOI]

Neurocomputing, 2022

MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding.

[BibT_eX]

[DOI]

CoRR, 2022

MetaComp: Learning to Adapt for Online Depth Completion.

[BibT_eX]

[DOI]

CoRR, 2022

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective.

[BibT_eX]

[DOI]

CoRR, 2022

Structured and Natural Responses Co-generation for Conversational Search.

[BibT_eX]

[DOI]

Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Video Question Answering: Datasets, Algorithms and Challenges.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Fine-Grained Scene Graph Generation with Data Transfer.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Invariant Grounding for Video Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Content-Variant Reference Image Quality Assessment via Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Rethinking the Two-Stage Framework for Grounded Situation Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey.

[BibT_eX]

[DOI]

CoRR, 2021

Deconfounded Video Moment Retrieval with Causal Intervention.

[BibT_eX]

[DOI]

Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Video Visual Relation Detection via Iterative Inference.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

VidVRD 2021: The Third Grand Challenge on Video Relation Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Boundary Proposal Network for Two-stage Natural Language Video Localization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Context-Aware Graph Label Propagation Network for Saliency Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Context-Aware Deep Spatiotemporal Network for Hand Pose Estimation From Depth Images.

[BibT_eX]

[DOI]

IEEE Trans. Cybern., 2020

Human-Centric Clothing Segmentation via Deformable Semantic Locality-Preserving Network.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2020

2019

Multi-Task Structure-Aware Context Modeling for Robust Keypoint-Based Object Tracking.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2019

2018

Context-Aware Deep Spatio-Temporal Network for Hand Pose Estimation from Depth Images.

[BibT_eX]

[DOI]

CoRR, 2018

Semantic Locality-Aware Deformable Network for Clothing Segmentation.

[BibT_eX]

[DOI]

Wei Ji

Xi Li

Yueting Zhuang

Omar El Farouk Bourahla

Yixin Ji

Shihao Li

Jiabao Cui

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Wei Ji

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...