Shilong Liu

Orcid: 0009-0003-5796-0627

According to our database1, Shilong Liu authored at least 56 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features.
CoRR, September, 2025

ED-Pose++: Enhanced Explicit Box Detection for Conventional and Interactive Multi-Object Keypoint Detection.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

A Mutual Supervision Framework for Referring Expression Segmentation and Generation.
Int. J. Comput. Vis., June, 2025

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models.
CoRR, February, 2025

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models.
CoRR, January, 2025

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision.
IEEE Trans. Vis. Comput. Graph., January, 2024

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video.
CoRR, 2024

DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding.
CoRR, 2024

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective.
CoRR, 2024

TAPTRv2: Attention-based Position Update Improves Tracking Any Point.
CoRR, 2024

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent.
CoRR, 2024

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection.
CoRR, 2024

Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models.
CoRR, 2024

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks.
CoRR, 2024

Interfacing Foundation Models' Embeddings.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

TAPTRv2: Attention-based Position Update Improves Tracking Any Point.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

TOSS: High-quality Text-guided Novel View Synthesis from a Single Image.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents.
Proceedings of the Computer Vision - ECCV 2024, 2024

Segment and Recognize Anything at Any Granularity.
Proceedings of the Computer Vision - ECCV 2024, 2024

TAPTR: Tracking Any Point with Transformers as Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy.
Proceedings of the Computer Vision - ECCV 2024, 2024

Recognize Anything: A Strong Image Tagging Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Visual in-Context Prompting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Interfacing Foundation Models' Embeddings.
CoRR, 2023

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models.
CoRR, 2023

T-Rex: Counting by Visual Prompting.
CoRR, 2023

Semantic-SAM: Segment and Recognize Anything at Any Granularity.
CoRR, 2023

detrex: Benchmarking Detection Transformers.
CoRR, 2023

A Strong and Reproducible Object Detector with Only Public Datasets.
CoRR, 2023

A Simple Framework for Open-Vocabulary Segmentation and Detection.
CoRR, 2023

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
CoRR, 2023

DA-BEV: Depth Aware BEV Transformer for 3D Object Detection.
CoRR, 2023

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

A Simple Framework for Open-Vocabulary Segmentation and Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Neural Interactive Keypoint Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Detection Transformer with Stable Matching.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MP-Former: Mask-Piloted Transformer for Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Towards generalizable detection of face forgery via self-guided model-agnostic learning.
Pattern Recognit. Lett., 2022

A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation.
CoRR, 2022

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models.
CoRR, 2022

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR.
Proceedings of the Tenth International Conference on Learning Representations, 2022

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Unrestricted Adversarial Attacks on ImageNet Competition.
CoRR, 2021

Query2Label: A Simple Transformer Way to Multi-Label Classification.
CoRR, 2021

Unsupervised Part Segmentation Through Disentangling Appearance and Shape.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021


  Loading...