Weijie Su

Orcid: 0000-0001-8630-6059

Affiliations:

University of Science and Technology of China (USTC), Hefei, China

According to our database¹, Weijie Su authored at least 17 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning.

[BibT_eX]

[DOI]

CoRR, October, 2025

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints.

[BibT_eX]

[DOI]

CoRR, October, 2025

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data.

[BibT_eX]

[DOI]

CoRR, September, 2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency.

[BibT_eX]

[DOI]

CoRR, August, 2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents.

[BibT_eX]

[DOI]

CoRR, July, 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost.

[BibT_eX]

[DOI]

CoRR, May, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

CoMemo: LVLMs Need Image Context with Image Memory.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.

[BibT_eX]

[DOI]

CoRR, 2023

Siamese Image Modeling for Self-Supervised Vision Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2021

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Weijie Su

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...