Jiayi Ji

Orcid: 0000-0002-9956-6308

According to our database1, Jiayi Ji authored at least 113 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Evaluating and Mitigating Relationship Hallucinations in Large Vision-Language Models.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2026

Boosting Filter Optimization and Prompt-Guided Decoding for Mixed Degradation Image Restoration.
Int. J. Comput. Vis., June, 2026

An Extensive Benchmark for Single-Round and Multi-Round Instruction-Based Image Editing.
Int. J. Comput. Vis., May, 2026

Plan Before Search: Search Agents Need Plan.
CoRR, May, 2026

Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning.
CoRR, May, 2026

Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models.
CoRR, May, 2026

Boosting Multi-Modal Large Language Model With Enhanced Visual Features.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2026

CoP: Chain of Perception for Referring 3D Instance Segmentation.
Int. J. Comput. Vis., April, 2026

PixDLM: A Dual-Path Multimodal Language Model for UAV Reasoning Segmentation.
CoRR, April, 2026

HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models.
CoRR, April, 2026

Persistent Story World Simulation with Continuous Character Customization.
CoRR, March, 2026

Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding.
CoRR, March, 2026

Test-Time Computing for Referring Multimodal Large Language Models.
CoRR, February, 2026

MICON-Bench: Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Models.
CoRR, February, 2026

SafeNeuron: Neuron-Level Safety Alignment for Large Language Models.
CoRR, February, 2026

MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation.
CoRR, January, 2026

CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval.
CoRR, January, 2026

3D-STMN++: Leveraging semantic proxies to enhance superpoint-text matching for 3D Referring Expression Segmentation.
Pattern Recognit., 2026

Wavelet-based learning and optimized sampling for image deraining.
Pattern Recognit., 2026

SFIR: Optimizing spatial and frequency domains for image restoration.
Pattern Recognit., 2026

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

FIND: A Simple Yet Effective Baseline for Diffusion-Generated Image Detection.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

3D-DRES: Detailed 3D Referring Expression Segmentation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
TraDiffusion: Trajectory-Based Training-Free Image Generation.
Int. J. Comput. Vis., December, 2025

Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting.
CoRR, December, 2025

M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis.
CoRR, December, 2025

Creating High-Quality 3D Content by Bridging the Gap between Text-to-2D and Text-to-3D Generation.
ACM Trans. Multim. Comput. Commun. Appl., November, 2025

NICE: Improving Panoptic Narrative Detection and Segmentation With Cascading Collaborative Learning.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2025

MDReID: Modality-Decoupled Learning for Any-to-Any Multi-Modal Object Re-Identification.
CoRR, October, 2025

Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions.
CoRR, October, 2025

MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites.
CoRR, October, 2025

CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning.
CoRR, October, 2025

Training-Free Multimodal Large Language Model Orchestration.
CoRR, August, 2025

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation.
CoRR, August, 2025

Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive.
CoRR, July, 2025

SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence.
CoRR, June, 2025

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning.
CoRR, May, 2025

Image Captioning via Dynamic Path Customization.
IEEE Trans. Neural Networks Learn. Syst., April, 2025

JM3D & JM3D-LLM: Elevating 3D Representation With Joint Multi-Modal Cues.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach.
CoRR, April, 2025

An Efficient and Mixed Heterogeneous Model for Image Restoration.
CoRR, April, 2025

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization.
CoRR, March, 2025

MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning.
CoRR, March, 2025

ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation.
CoRR, March, 2025

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension.
CoRR, March, 2025

MlyPredCSED: based on extreme point deviation compensated clustering combined with cross-scale convolutional neural networks to predict multiple lysine sites in human.
Briefings Bioinform., March, 2025

Towards General Visual-Linguistic Face Forgery Detection(V2).
CoRR, February, 2025

ME-FAS: Multimodal Text Enhancement for Cross-Domain Face Anti-Spoofing.
IEEE Trans. Inf. Forensics Secur., 2025

M3ixup: A multi-modal data augmentation approach for image captioning.
Pattern Recognit., 2025

Optical remote sensing image salient object detection via bidirectional cross-attention and attention restoration.
Pattern Recognit., 2025

The Evolution of E-commerce Leadership: Traits, Innovation, and Performance Across Time.
Proceedings of the E-Business. Generative Artificial Intelligence and Management Transformation, 2025

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Multi-Modal Object Re-identification via Sparse Mixture-of-Experts.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Towards Semantic Equivalence of Tokenization in Multimodal LLM.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Aigi-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

ACL: Activating Capability of Linear Attention for Image Restoration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Towards General Visual-Linguistic Face Forgery Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding.
CoRR, 2024

Any-to-3D Generation via Hybrid Diffusion Supervision.
CoRR, 2024

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension.
CoRR, 2024

TraDiffusion: Trajectory-Based Training-Free Image Generation.
CoRR, 2024

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models.
CoRR, 2024

INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model.
CoRR, 2024

HRSAM: Efficiently Segment Anything in High-Resolution Images.
CoRR, 2024

Evaluating and Analyzing Relationship Hallucinations in LVLMs.
CoRR, 2024

Synergistic Dual Spatial-aware Generation of Image-to-text and Text-to-image.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

3D-GRES: Generalized 3D Referring Expression Segmentation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

Multi-branch Collaborative Learning Network for 3D Visual Grounding.
Proceedings of the Computer Vision - ECCV 2024, 2024

APL: Anchor-Based Prompt Learning for One-Stage Weakly Supervised Referring Expression Comprehension.
Proceedings of the Computer Vision - ECCV 2024, 2024

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Toward Open-Set Human Object Interaction Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Towards local visual modeling for image captioning.
Pattern Recognit., June, 2023

Knowing What it is: Semantic-Enhanced Dual Attention Transformer.
IEEE Trans. Multim., 2023

Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning.
IEEE Trans. Multim., 2023

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation.
CoRR, 2023

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning.
CoRR, 2023

M3PS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization in E-commerce.
CoRR, 2023

Semi-Supervised Panoptic Narrative Grounding.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
CIMTx: An R Package for Causal Inference with Multiple Treatments using Observational Data.
R J., December, 2022

Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image Captioning.
IEEE Trans. Image Process., 2022

Spatiotemporal Evolution of the Carbon Fluxes from Bamboo Forests and their Response to Climate Change Based on a BEPS Model in China.
Remote. Sens., 2022

2021
Remote Sensing Estimation of Bamboo Forest Aboveground Biomass Based on Geographically Weighted Regression.
Remote. Sens., 2021

Multiscale leaf area index assimilation for Moso bamboo forest based on Sentinel-2 and MODIS data.
Int. J. Appl. Earth Obs. Geoinformation, 2021

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dual-level Collaborative Transformer for Image Captioning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

2019
Semantic-aware Image Deblurring.
CoRR, 2019

Variational Structured Semantic Inference for Diverse Image Captioning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019


  Loading...