Juncheng Li

Orcid: 0000-0003-2258-1291

Affiliations:
  • Zhejiang University, Hangzhou, China


According to our database1, Juncheng Li authored at least 78 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
DBA: Efficient Transformer With Dynamic Bilinear Low-Rank Attention.
IEEE Trans. Neural Networks Learn. Syst., August, 2025

HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization.
CoRR, August, 2025

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45<sup>°</sup> Law.
CoRR, July, 2025

Consistent and Invariant Generalization Learning for Short-video Misinformation Detection.
CoRR, July, 2025

Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder.
CoRR, June, 2025

What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities.
CoRR, June, 2025

MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models.
CoRR, June, 2025

FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL.
CoRR, June, 2025

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query.
CoRR, June, 2025

FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents.
CoRR, June, 2025

Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation.
CoRR, June, 2025

On Path to Multimodal Generalist: General-Level and General-Bench.
CoRR, May, 2025

EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model.
CoRR, April, 2025

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program.
CoRR, April, 2025

Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark.
CoRR, March, 2025

SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models.
CoRR, March, 2025

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation.
CoRR, March, 2025

Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts.
CoRR, March, 2025

AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks.
CoRR, February, 2025

Toward Complex-query Referring Image Segmentation: A Novel Benchmark.
ACM Trans. Multim. Comput. Commun. Appl., January, 2025

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ITERATE: Image-Text Enhancement, Retrieval, and Alignment for Transmodal Evolution with LLMs.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Choice is what matters after Attention.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
RustGraph: Robust Anomaly Detection in Dynamic Graphs by Jointly Learning Structural-Temporal Dependency.
IEEE Trans. Knowl. Data Eng., July, 2024

MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation.
CoRR, 2024

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework.
CoRR, 2024

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining.
CoRR, 2024

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness.
CoRR, 2024

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing.
CoRR, 2024

Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms.
CoRR, 2024

RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection.
CoRR, 2024

Align<sup>2</sup>LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation.
CoRR, 2024

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition.
CoRR, 2024

LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation.
CoRR, 2024

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models.
CoRR, 2024

I3: Intent-Introspective Retrieval Conditioned on Instructions.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Unified Generative and Discriminative Training for Multi-modal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

The 2nd International Workshop on Deep Multi-modal Generation and Retrieval.
Proceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval, 2024

WorldGPT: Empowering LLM as Multimodal World Model.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

DEMON24: ACM MM24 Demonstrative Instruction Following Challenge.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Fact : Teaching MLLMs with Faithful, Concise and Transferable Rationales.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Auto-Encoding Morph-Tokens for Multimodal LLM.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DIEM: Decomposition-Integration Enhancing Multimodal Insights.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer.
CoRR, 2023

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.
CoRR, 2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval.
CoRR, 2023

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions.
CoRR, 2023

Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration.
CoRR, 2023

Meta-augmented Prompt Tuning for Better Few-shot Learning.
CoRR, 2023

Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Reasoning Makes Good Annotators : An Automatic Task-specific Rules Distilling Framework for Low-resource Relation Extraction.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval.
CoRR, 2022

Fine-Grained Semantically Aligned Vision-Language Pre-Training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Weakly-supervised Disentanglement Network for Video Fingerspelling Detection.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-Based Image Captioning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Walking with MIND: Mental Imagery eNhanceD Embodied QA.
Proceedings of the 27th ACM International Conference on Multimedia, 2019


  Loading...