Weijie Su

Orcid: 0000-0001-8630-6059

Affiliations:
  • University of Science and Technology of China (USTC), Hefei, China


According to our database1, Weijie Su authored at least 13 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents.
CoRR, July, 2025

CoMemo: LVLMs Need Image Context with Image Memory.
CoRR, June, 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost.
CoRR, May, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
CoRR, April, 2025

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
CoRR, 2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.
CoRR, 2023

Siamese Image Modeling for Self-Supervised Vision Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2021
Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
VL-BERT: Pre-training of Generic Visual-Linguistic Representations.
Proceedings of the 8th International Conference on Learning Representations, 2020


  Loading...