Jiayi Ji

Orcid: 0000-0002-9956-6308

According to our database1, Jiayi Ji authored at least 76 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2026
SFIR: Optimizing spatial and frequency domains for image restoration.
Pattern Recognit., 2026

2025
Training-Free Multimodal Large Language Model Orchestration.
CoRR, August, 2025

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation.
CoRR, August, 2025

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models.
CoRR, August, 2025

HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation.
CoRR, July, 2025

Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive.
CoRR, July, 2025

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models.
CoRR, July, 2025

SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence.
CoRR, June, 2025

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning.
CoRR, May, 2025

Image Captioning via Dynamic Path Customization.
IEEE Trans. Neural Networks Learn. Syst., April, 2025

JM3D & JM3D-LLM: Elevating 3D Representation With Joint Multi-Modal Cues.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach.
CoRR, April, 2025

An Efficient and Mixed Heterogeneous Model for Image Restoration.
CoRR, April, 2025

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization.
CoRR, March, 2025

MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning.
CoRR, March, 2025

ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation.
CoRR, March, 2025

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension.
CoRR, March, 2025

Towards General Visual-Linguistic Face Forgery Detection(V2).
CoRR, February, 2025

ME-FAS: Multimodal Text Enhancement for Cross-Domain Face Anti-Spoofing.
IEEE Trans. Inf. Forensics Secur., 2025

M3ixup: A multi-modal data augmentation approach for image captioning.
Pattern Recognit., 2025

Optical remote sensing image salient object detection via bidirectional cross-attention and attention restoration.
Pattern Recognit., 2025

The Evolution of E-commerce Leadership: Traits, Innovation, and Performance Across Time.
Proceedings of the E-Business. Generative Artificial Intelligence and Management Transformation, 2025

Towards Semantic Equivalence of Tokenization in Multimodal LLM.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ACL: Activating Capability of Linear Attention for Image Restoration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Towards General Visual-Linguistic Face Forgery Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding.
CoRR, 2024

Any-to-3D Generation via Hybrid Diffusion Supervision.
CoRR, 2024

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension.
CoRR, 2024

TraDiffusion: Trajectory-Based Training-Free Image Generation.
CoRR, 2024

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models.
CoRR, 2024

INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model.
CoRR, 2024

HRSAM: Efficiently Segment Anything in High-Resolution Images.
CoRR, 2024

Evaluating and Analyzing Relationship Hallucinations in LVLMs.
CoRR, 2024

MlyPredCSED: based on extreme point deviation compensated clustering combined with cross-scale convolutional neural networks to predict multiple lysine sites in human.
Briefings Bioinform., 2024

Synergistic Dual Spatial-aware Generation of Image-to-text and Text-to-image.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

3D-GRES: Generalized 3D Referring Expression Segmentation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

Multi-branch Collaborative Learning Network for 3D Visual Grounding.
Proceedings of the Computer Vision - ECCV 2024, 2024

APL: Anchor-Based Prompt Learning for One-Stage Weakly Supervised Referring Expression Comprehension.
Proceedings of the Computer Vision - ECCV 2024, 2024

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Toward Open-Set Human Object Interaction Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Towards local visual modeling for image captioning.
Pattern Recognit., June, 2023

Knowing What it is: Semantic-Enhanced Dual Attention Transformer.
IEEE Trans. Multim., 2023

Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning.
IEEE Trans. Multim., 2023

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation.
CoRR, 2023

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning.
CoRR, 2023

M3PS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization in E-commerce.
CoRR, 2023

Semi-Supervised Panoptic Narrative Grounding.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
CIMTx: An R Package for Causal Inference with Multiple Treatments using Observational Data.
R J., December, 2022

Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image Captioning.
IEEE Trans. Image Process., 2022

Spatiotemporal Evolution of the Carbon Fluxes from Bamboo Forests and their Response to Climate Change Based on a BEPS Model in China.
Remote. Sens., 2022

2021
Remote Sensing Estimation of Bamboo Forest Aboveground Biomass Based on Geographically Weighted Regression.
Remote. Sens., 2021

Multiscale leaf area index assimilation for Moso bamboo forest based on Sentinel-2 and MODIS data.
Int. J. Appl. Earth Obs. Geoinformation, 2021

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dual-level Collaborative Transformer for Image Captioning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

2019
Semantic-aware Image Deblurring.
CoRR, 2019

Variational Structured Semantic Inference for Diverse Image Captioning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019


  Loading...