Alex Jinpeng Wang

Orcid: 0000-0001-6127-9146

Affiliations:
  • National University of Singapore, Show Lab, Singapore


According to our database1, Alex Jinpeng Wang authored at least 23 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Negation-Aware Test-Time Adaptation for Vision-Language Models.
CoRR, July, 2025

Unlearning the Noisy Correspondence Makes CLIP More Robust.
CoRR, July, 2025

Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT.
CoRR, May, 2025

Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought.
CoRR, May, 2025

V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models.
CoRR, April, 2025

Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models.
CoRR, March, 2025

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation.
CoRR, February, 2025

Vision-centric Token Compression in Large Language Model.
CoRR, February, 2025

2024
Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided Text Prompts.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training.
CoRR, 2024

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Parrot Captions Teach CLIP to Spot Text.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Too Large; Data Reduction for Vision-Language Pre-Training.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniVTG: Towards Unified Video-Language Temporal Grounding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
Position-guided Text Prompt for Vision-Language Pre-training.
CoRR, 2022

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022.
CoRR, 2022

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.
CoRR, 2022

Egocentric Video-Language Pretraining.
CoRR, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval.
CoRR, 2022

Revitalize Region Feature for Democratizing Video-Language Pre-training.
CoRR, 2022

All in One: Exploring Unified Video-Language Pre-training.
CoRR, 2022

Object-aware Video-language Pre-training for Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Video-Text Pre-training with Learned Regions.
CoRR, 2021


  Loading...