Peng Jin

Orcid: 0000-0001-9287-6410

Affiliations:
  • Peking University, School of Electronic and Computer Engineering, Shenzhen, China


According to our database1, Peng Jin authored at least 44 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Disentangled Concept Matching for Text-video Retrieval through Perception Imitation.
ACM Trans. Multim. Comput. Commun. Appl., April, 2026

Next Patch Prediction for AutoRegressive Visual Generation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward.
CoRR, November, 2025

Seeing Before Reasoning: A Unified Framework for Generalizable and Explainable Fake Image Detection.
CoRR, September, 2025

Hierarchical Banzhaf Interaction for General Video-Language Representation Learning.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation.
CoRR, March, 2025

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation.
CoRR, March, 2025

Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

MoH: Multi-Head Attention as Mixture-of-Head Attention.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LlaVA-CoT: Let Vision Language Models Reason Step-By-Step.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Aligning Instance Brownian Bridge with Texts for Open-Vocabulary Video Instance Segmentation.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning.
CoRR, 2024

Next Patch Prediction for Autoregressive Visual Generation.
CoRR, 2024

Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection.
CoRR, 2024

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step.
CoRR, 2024

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation.
CoRR, 2024

LLMBind: A Unified Modality-Task Integration Framework.
CoRR, 2024

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models.
CoRR, 2024

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation.
CoRR, 2024

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Repaint123: Fast and High-Quality One Image to 3D Generation with Progressive Controllable Repainting.
Proceedings of the Computer Vision - ECCV 2024, 2024

FreestyleRet: Retrieving Images from Style-Diversified Queries.
Proceedings of the Computer Vision - ECCV 2024, 2024

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Parallel Vertex Diffusion for Unified Visual Grounding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering.
IEEE Trans. Image Process., 2023

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting.
CoRR, 2023

FreestyleRet: Retrieving Images from Style-Diversified Queries.
CoRR, 2023

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding.
CoRR, 2023

Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs.
CoRR, 2023

Parallel Vertex Diffusion for Unified Visual Grounding.
CoRR, 2023

Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

TG-VQA: Ternary Game of Video Question Answering.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering.
CoRR, 2022

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022


  Loading...