Kaihang Pan

Orcid: 0009-0001-2967-4573

According to our database¹, Kaihang Pan authored at least 22 papers between 2023 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities.

[BibT_eX]

[DOI]

CoRR, June, 2025

FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL.

[BibT_eX]

[DOI]

CoRR, June, 2025

Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

On Path to Multimodal Generalist: General-Level and General-Bench.

[BibT_eX]

[DOI]

CoRR, May, 2025

Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, April, 2025

Improving Vision Anomaly Detection With the Guidance of Language Modality.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

RustGraph: Robust Anomaly Detection in Dynamic Graphs by Jointly Learning Structural-Temporal Dependency.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., July, 2024

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining.

[BibT_eX]

[DOI]

CoRR, 2024

I3: Intent-Introspective Retrieval Conditioned on Instructions.

[BibT_eX]

[DOI]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Unified Generative and Discriminative Training for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Auto-Encoding Morph-Tokens for Multimodal LLM.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Improving Vision Anomaly Detection with the Guidance of Language Modality.

[BibT_eX]

[DOI]

CoRR, 2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval.

[BibT_eX]

[DOI]

CoRR, 2023

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2023

Meta-augmented Prompt Tuning for Better Few-shot Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Kaihang Pan

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...