Weiyun Wang

Orcid: 0009-0000-1174-9103

According to our database¹, Weiyun Wang authored at least 43 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments.

[BibT_eX]

[DOI]

CoRR, March, 2026

"I Want to Keep My Phone Away From the Bed": Designing a Smart Pillow for Sleep Onset.

[BibT_eX]

[DOI]

Weiyun Wang

Ilyena Hirskyj-Douglas

Kejin Yu

Sharon Xianghua Ding

Proceedings of the Twentieth International Conference on Tangible, 2026

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

From Information to Experience: Exploring Users' Engagement with Different Stress Displays.

[BibT_eX]

[DOI]

Weiyun Wang

Scot Gilmour

Naral Chalermchaikosol

Ilyena Hirskyj-Douglas

Jiaqi Wang

Sharon Xianghua Ding

Proceedings of the 2026 Designing Interactive Systems Conference, 2026

2025

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution.

[BibT_eX]

[DOI]

CoRR, October, 2025

MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites.

[BibT_eX]

[DOI]

CoRR, October, 2025

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning.

[BibT_eX]

[DOI]

CoRR, October, 2025

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization.

[BibT_eX]

[DOI]

CoRR, October, 2025

Sequential Diffusion Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data.

[BibT_eX]

[DOI]

CoRR, September, 2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency.

[BibT_eX]

[DOI]

CoRR, August, 2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents.

[BibT_eX]

[DOI]

CoRR, July, 2025

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning.

[BibT_eX]

[DOI]

CoRR, July, 2025

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

Demystify Transformers & Convolutions in Modern Image Deep Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning.

[BibT_eX]

[DOI]

CoRR, March, 2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.

[BibT_eX]

[DOI]

CoRR, February, 2025

Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

et al.

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Docopilot: Improving Multimodal Models for Document-Level Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.

[BibT_eX]

[DOI]

Vis. Intell., 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.

[BibT_eX]

[DOI]

CoRR, 2024

MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

Needle In A Multimodal Haystack.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

Digital Making for Inheritance and Enlivening Intangible Cultural Heritage: A Case of Hairy Monkey Handicrafts.

[BibT_eX]

[DOI]

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023

CLIPText: A New Paradigm for Zero-shot Text Classification.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Everyday Space as an Interface for Health Data Engagement: Designing Tangible Displays of Stress Data.

[BibT_eX]

[DOI]

Weiyun Wang

Sharon Xianghua Ding

Ilyena Hirskyj-Douglas

Proceedings of the 2023 ACM Designing Interactive Systems Conference, 2023

2022

Sensor Mathematical Model Data Fusion Biobjective Optimization.

[BibT_eX]

[DOI]

Maowen Hou

Weiyun Wang

J. Sensors, 2022

Demystify Transformers & Convolutions in Modern Image Deep Networks.

[BibT_eX]

[DOI]

CoRR, 2022

Weiyun Wang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...