Shilin Xu

Orcid: 0009-0008-7178-5358

Affiliations:
  • Peking University, School of Intelligence Science and Technology, National Key Laboratory of General Artificial Intelligence, Beijing, China


According to our database1, Shilin Xu authored at least 21 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World.
CoRR, June, 2025

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query.
CoRR, June, 2025

DST-Det: Open-Vocabulary Object Detection via Dynamic Self-Training.
IEEE Trans. Circuits Syst. Video Technol., May, 2025

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models.
CoRR, May, 2025

On Path to Multimodal Generalist: General-Level and General-Bench.
CoRR, May, 2025

An Empirical Study of GPT-4o Image Generation Capabilities.
CoRR, April, 2025

4th PVUW MeViS 3rd Place Report: Sa2VA.
CoRR, April, 2025

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs.
CoRR, January, 2025

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos.
CoRR, January, 2025

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Panoptic-PartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Towards Open Vocabulary Learning: A Survey.
IEEE Trans. Pattern Anal. Mach. Intell., 2024

RLRF4Rec: Reinforcement Learning from Recsys Feedback for Enhanced Recommendation Reranking.
CoRR, 2024

LLAVADI: What Matters For Multimodal Large Language Models Distillation.
CoRR, 2024

RAP-SAM: Towards Real-Time All-Purpose Segment Anything.
CoRR, 2024

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection.
CoRR, 2024

2023
DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection.
CoRR, 2023

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation.
CoRR, 2023

2022
Query Learning of Both Thing and Stuff for Panoptic Segmentation.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022

Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022


  Loading...