Kaihang Pan

Orcid: 0009-0001-2967-4573

According to our database1, Kaihang Pan authored at least 22 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities.
CoRR, June, 2025

FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL.
CoRR, June, 2025

Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation.
CoRR, June, 2025

Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning.
CoRR, May, 2025

On Path to Multimodal Generalist: General-Level and General-Bench.
CoRR, May, 2025

Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning.
CoRR, April, 2025

Improving Vision Anomaly Detection With the Guidance of Language Modality.
IEEE Trans. Multim., 2025

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
RustGraph: Robust Anomaly Detection in Dynamic Graphs by Jointly Learning Structural-Temporal Dependency.
IEEE Trans. Knowl. Data Eng., July, 2024

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining.
CoRR, 2024

I3: Intent-Introspective Retrieval Conditioned on Instructions.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Unified Generative and Discriminative Training for Multi-modal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Auto-Encoding Morph-Tokens for Multimodal LLM.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Improving Vision Anomaly Detection with the Guidance of Language Modality.
CoRR, 2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval.
CoRR, 2023

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions.
CoRR, 2023

Meta-augmented Prompt Tuning for Better Few-shot Learning.
CoRR, 2023

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023


  Loading...