Weiyun Wang
Orcid: 0000-0002-7116-6425
According to our database1,
Weiyun Wang
authored at least 32 papers
between 2022 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
CoRR, July, 2025
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning.
CoRR, July, 2025
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.
CoRR, July, 2025
CoRR, June, 2025
CoRR, May, 2025
IEEE Trans. Pattern Anal. Mach. Intell., April, 2025
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models.
CoRR, April, 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
CoRR, April, 2025
CoRR, March, 2025
CoRR, February, 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.
Vis. Intell., 2024
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.
CoRR, 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.
CoRR, 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.
CoRR, 2024
MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity.
Sci. China Inf. Sci., 2024
How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.
Sci. China Inf. Sci., 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
2023
InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023
Digital Making for Inheritance and Enlivening Intangible Cultural Heritage: A Case of Hairy Monkey Handicrafts.
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
Everyday Space as an Interface for Health Data Engagement: Designing Tangible Displays of Stress Data.
Proceedings of the 2023 ACM Designing Interactive Systems Conference, 2023
2022