Juncheng Li

Orcid: 0000-0003-2258-1291

Affiliations:

Zhejiang University, Hangzhou, China

According to our database¹, Juncheng Li authored at least 82 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Towards Physically Executable 3D Gaussian for Embodied Navigation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Fast Thinking for Large Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

Towards Meta-Cognitive Knowledge Editing for Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, September, 2025

DBA: Efficient Transformer With Dynamic Bilinear Low-Rank Attention.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., August, 2025

HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization.

[BibT_eX]

[DOI]

CoRR, August, 2025

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45<sup>°</sup> Law.

[BibT_eX]

[DOI]

CoRR, July, 2025

Consistent and Invariant Generalization Learning for Short-video Misinformation Detection.

[BibT_eX]

[DOI]

CoRR, July, 2025

Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder.

[BibT_eX]

[DOI]

CoRR, June, 2025

What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities.

[BibT_eX]

[DOI]

CoRR, June, 2025

MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL.

[BibT_eX]

[DOI]

CoRR, June, 2025

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query.

[BibT_eX]

[DOI]

CoRR, June, 2025

FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents.

[BibT_eX]

[DOI]

CoRR, June, 2025

Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model.

[BibT_eX]

[DOI]

CoRR, April, 2025

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program.

[BibT_eX]

[DOI]

CoRR, April, 2025

SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts.

[BibT_eX]

[DOI]

CoRR, March, 2025

AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks.

[BibT_eX]

[DOI]

CoRR, February, 2025

Toward Complex-query Referring Image Segmentation: A Novel Benchmark.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., January, 2025

Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

On Path to Multimodal Generalist: General-Level and General-Bench.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ITERATE: Image-Text Enhancement, Retrieval, and Alignment for Transmodal Evolution with LLMs.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

Choice is what matters after Attention.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

RustGraph: Robust Anomaly Detection in Dynamic Graphs by Jointly Learning Structural-Temporal Dependency.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., July, 2024

MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation.

[BibT_eX]

[DOI]

CoRR, 2024

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework.

[BibT_eX]

[DOI]

CoRR, 2024

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining.

[BibT_eX]

[DOI]

CoRR, 2024

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness.

[BibT_eX]

[DOI]

CoRR, 2024

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing.

[BibT_eX]

[DOI]

CoRR, 2024

Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms.

[BibT_eX]

[DOI]

CoRR, 2024

RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection.

[BibT_eX]

[DOI]

CoRR, 2024

Align<sup>2</sup>LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation.

[BibT_eX]

[DOI]

CoRR, 2024

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition.

[BibT_eX]

[DOI]

CoRR, 2024

LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation.

[BibT_eX]

[DOI]

CoRR, 2024

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

I3: Intent-Introspective Retrieval Conditioned on Instructions.

[BibT_eX]

[DOI]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Unified Generative and Discriminative Training for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

The 2nd International Workshop on Deep Multi-modal Generation and Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval, 2024

WorldGPT: Empowering LLM as Multimodal World Model.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

DEMON24: ACM MM24 Demonstrative Instruction Following Challenge.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Fact : Teaching MLLMs with Faithful, Concise and Transferable Rationales.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Auto-Encoding Morph-Tokens for Multimodal LLM.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DIEM: Decomposition-Integration Enhancing Multimodal Insights.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer.

[BibT_eX]

[DOI]

CoRR, 2023

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.

[BibT_eX]

[DOI]

CoRR, 2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval.

[BibT_eX]

[DOI]

CoRR, 2023

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2023

Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration.

[BibT_eX]

[DOI]

CoRR, 2023

Meta-augmented Prompt Tuning for Better Few-shot Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Reasoning Makes Good Annotators : An Automatic Task-specific Rules Distilling Framework for Low-resource Relation Extraction.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval.

[BibT_eX]

[DOI]

CoRR, 2022

Fine-Grained Semantically Aligned Vision-Language Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Weakly-supervised Disentanglement Network for Video Fingerspelling Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-Based Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Walking with MIND: Mental Imagery eNhanceD Embodied QA.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Juncheng Li

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...