Shilong Liu

Orcid: 0009-0003-5796-0627

Affiliations:

International Digital Economy Academy (IDEA), Shenzhen, China

According to our database¹, Shilong Liu authored at least 56 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features.

[BibT_eX]

[DOI]

CoRR, September, 2025

ED-Pose++: Enhanced Explicit Box Detection for Conventional and Interactive Multi-Object Keypoint Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

A Mutual Supervision Framework for Referring Expression Segmentation and Generation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., June, 2025

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models.

[BibT_eX]

[DOI]

CoRR, February, 2025

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., January, 2024

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video.

[BibT_eX]

[DOI]

CoRR, 2024

DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective.

[BibT_eX]

[DOI]

CoRR, 2024

TAPTRv2: Attention-based Position Update Improves Tracking Any Point.

[BibT_eX]

[DOI]

CoRR, 2024

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent.

[BibT_eX]

[DOI]

CoRR, 2024

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection.

[BibT_eX]

[DOI]

CoRR, 2024

Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

Interfacing Foundation Models' Embeddings.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

TAPTRv2: Attention-based Position Update Improves Tracking Any Point.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

TOSS: High-quality Text-guided Novel View Synthesis from a Single Image.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Segment and Recognize Anything at Any Granularity.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

TAPTR: Tracking Any Point with Transformers as Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Recognize Anything: A Strong Image Tagging Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Visual in-Context Prompting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Interfacing Foundation Models' Embeddings.

[BibT_eX]

[DOI]

CoRR, 2023

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2023

T-Rex: Counting by Visual Prompting.

[BibT_eX]

[DOI]

CoRR, 2023

Semantic-SAM: Segment and Recognize Anything at Any Granularity.

[BibT_eX]

[DOI]

CoRR, 2023

detrex: Benchmarking Detection Transformers.

[BibT_eX]

[DOI]

CoRR, 2023

A Strong and Reproducible Object Detector with Only Public Datasets.

[BibT_eX]

[DOI]

CoRR, 2023

A Simple Framework for Open-Vocabulary Segmentation and Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.

[BibT_eX]

[DOI]

CoRR, 2023

DA-BEV: Depth Aware BEV Transformer for 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

A Simple Framework for Open-Vocabulary Segmentation and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Neural Interactive Keypoint Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Detection Transformer with Stable Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MP-Former: Mask-Piloted Transformer for Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Towards generalizable detection of face forgery via self-guided model-agnostic learning.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2022

A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation.

[BibT_eX]

[DOI]

CoRR, 2022

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models.

[BibT_eX]

[DOI]

CoRR, 2022

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Unrestricted Adversarial Attacks on ImageNet Competition.

[BibT_eX]

[DOI]

CoRR, 2021

Query2Label: A Simple Transformer Way to Multi-Label Classification.

[BibT_eX]

[DOI]

CoRR, 2021

Unsupervised Part Segmentation Through Disentangling Appearance and Shape.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Shilong Liu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...