Xiangtai Li

Orcid: 0000-0002-0550-8247

According to our database¹, Xiangtai Li authored at least 173 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

EMOv2: Pushing 5M Vision Model Frontier.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

Visual Spatial Tuning.

[BibT_eX]

[DOI]

CoRR, November, 2025

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark.

[BibT_eX]

[DOI]

CoRR, October, 2025

PairUni: Pairwise Training for Unified Multimodal Language Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

From Masks to Worlds: A Hitchhiker's Guide to World Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence.

[BibT_eX]

[DOI]

CoRR, October, 2025

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, October, 2025

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training.

[BibT_eX]

[DOI]

CoRR, October, 2025

LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, October, 2025

D<sup>2</sup>GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction.

[BibT_eX]

[DOI]

CoRR, October, 2025

The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA.

[BibT_eX]

[DOI]

CoRR, September, 2025

One Flight Over the Gap: A Survey from Perspective to Panoramic Vision.

[BibT_eX]

[DOI]

CoRR, September, 2025

Rethinking Evaluation Metrics of Open-Vocabulary Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., August, 2025

PointDGRWKV: Generalizing RWKV-like Architecture to Unseen Domains for Point Cloud Classification.

[BibT_eX]

[DOI]

CoRR, August, 2025

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

Human-in-Context: Unified Cross-Domain 3D Human Motion Modeling via In-Context Learning.

[BibT_eX]

[DOI]

CoRR, August, 2025

Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection.

[BibT_eX]

[DOI]

CoRR, July, 2025

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology.

[BibT_eX]

[DOI]

CoRR, July, 2025

Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning.

[BibT_eX]

[DOI]

CoRR, July, 2025

DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World.

[BibT_eX]

[DOI]

CoRR, June, 2025

Towards Explainable Bilingual Multimodal Misinformation Detection and Localization.

[BibT_eX]

[DOI]

CoRR, June, 2025

Dense360: Dense Understanding from Omnidirectional Panoramas.

[BibT_eX]

[DOI]

CoRR, June, 2025

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions.

[BibT_eX]

[DOI]

CoRR, June, 2025

AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, June, 2025

CyberV: Cybernetics for Test-time Scaling in Video Understanding.

[BibT_eX]

[DOI]

CoRR, June, 2025

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query.

[BibT_eX]

[DOI]

CoRR, June, 2025

DST-Det: Open-Vocabulary Object Detection via Dynamic Self-Training.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., May, 2025

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

PixelThink: Towards Efficient Chain-of-Pixel Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.

[BibT_eX]

[DOI]

CoRR, May, 2025

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, May, 2025

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection.

[BibT_eX]

[DOI]

CoRR, May, 2025

Conditional Panoramic Image Generation via Masked Autoregressive Modeling.

[BibT_eX]

[DOI]

CoRR, May, 2025

BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation.

[BibT_eX]

[DOI]

CoRR, May, 2025

Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook.

[BibT_eX]

[DOI]

CoRR, May, 2025

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2025

NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results.

[BibT_eX]

[DOI]

Mohammad Aminul Islam

CoRR, April, 2025

DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency.

[BibT_eX]

[DOI]

CoRR, April, 2025

PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild.

[BibT_eX]

[DOI]

CoRR, April, 2025

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding.

[BibT_eX]

[DOI]

CoRR, April, 2025

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer.

[BibT_eX]

[DOI]

CoRR, April, 2025

An Empirical Study of GPT-4o Image Generation Capabilities.

[BibT_eX]

[DOI]

CoRR, April, 2025

Generative Classifier for Domain Generalization.

[BibT_eX]

[DOI]

CoRR, April, 2025

4th PVUW MeViS 3rd Place Report: Sa2VA.

[BibT_eX]

[DOI]

CoRR, April, 2025

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer.

[BibT_eX]

[DOI]

CoRR, March, 2025

UMC: Unified Resilient Controller for Legged Robots with Joint Malfunctions.

[BibT_eX]

[DOI]

CoRR, February, 2025

A Masked Reference Token Supervision-Based Iterative Visual-Language Framework for Robust Visual Grounding.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., January, 2025

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, January, 2025

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos.

[BibT_eX]

[DOI]

CoRR, January, 2025

Exploring plain ViT features for multi-class unsupervised visual anomaly detection.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2025

OmniAudio: Generating Spatial Audio from 360-Degree Video.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Three-Dimensional Trajectory Prediction with 3DMoTraj Dataset.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

On Path to Multimodal Generalist: General-Level and General-Bench.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Towards Semantic Equivalence of Tokenization in Multimodal LLM.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Unified Dense Prediction of Video Diffusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DreamRelation: Bridging Customization and Relation Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results.

[BibT_eX]

[DOI]

Mohammad Aminul Islam

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

Point Cloud Mamba: Point Cloud Learning via State Space Model.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Explore In-Context Segmentation via Latent Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2025

2024

Pair Then Relation: Pair-Net for Panoptic Scene Graph Generation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Panoptic-PartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Transformer-Based Visual Segmentation: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., September, 2024

Change Detection Methods for Remote Sensing in the Last Decade: A Comprehensive Review.

[BibT_eX]

[DOI]

Remote. Sens., July, 2024

Multi-Task Learning With Multi-Query Transformer for Dense Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., February, 2024

Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., February, 2024

Toward Robust Referring Image Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

Towards Open Vocabulary Learning: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2024

OV-VG: A benchmark for open-vocabulary visual grounding.

[BibT_eX]

[DOI]

Neurocomputing, 2024

ModelNet-O: A large-scale synthetic dataset for occlusion-aware point cloud classification.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2024

EMOv2: Pushing 5M Vision Model Frontier.

[BibT_eX]

[DOI]

CoRR, 2024

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing.

[BibT_eX]

[DOI]

CoRR, 2024

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

RelationBooth: Towards Relation-Aware Customized Object Generation.

[BibT_eX]

[DOI]

CoRR, 2024

PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners.

[BibT_eX]

[DOI]

CoRR, 2024

MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning.

[BibT_eX]

[DOI]

CoRR, 2024

LLAVADI: What Matters For Multimodal Large Language Models Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model.

[BibT_eX]

[DOI]

CoRR, 2024

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

BACON: Bayesian Optimal Condensation Framework for Dataset Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models.

[BibT_eX]

[DOI]

CoRR, 2024

Point-In-Context: Understanding Point Cloud via In-Context Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries.

[BibT_eX]

[DOI]

CoRR, 2024

Point Cloud Mamba: Point Cloud Learning via State Space Model.

[BibT_eX]

[DOI]

CoRR, 2024

Generalizable Entity Grounding via Assistance of Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

OMG-Seg: Is One Model Good Enough For All Segmentation?

[BibT_eX]

[DOI]

CoRR, 2024

RAP-SAM: Towards Real-Time All-Purpose Segment Anything.

[BibT_eX]

[DOI]

CoRR, 2024

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection.

[BibT_eX]

[DOI]

CoRR, 2024

A Generalist FaceX via Learning Unified Facial Representation.

[BibT_eX]

[DOI]

CoRR, 2024

Synergistic Dual Spatial-aware Generation of Image-to-text and Text-to-image.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MotionBooth: Motion-Aware Customized Text-to-Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

DGMamba: Domain Generalization via Generalized State Space Model.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

From Multimodal LLM to Human-level AI: Modality, Instruction, Reasoning and Beyond.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VG4D: Vision-Language Model Goes 4D Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Improving Video Segmentation via Dynamic Anchor Queries.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Open-Vocabulary SAM: Segment and Recognize Twenty-Thousand Classes Interactively.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Face-Adapter for Pre-trained Diffusion Models with Fine-Grained ID and Attribute Control.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Towards Language-Driven Video Inpainting via Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OMG-Seg: Is One Model Good Enough for all Segmentation?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Referring Image Editing: Object-Level Image Editing via Referring Expressions.

[BibT_eX]

[DOI]

Chang Liu

Xiangtai Li

Henghui Ding

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

You Can't Ignore Either: Unifying Structure and Feature Denoising for Robust Graph Learning.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

2023

Exploring Self-Supervised Learning for Multi-Modal Remote Sensing Pre-Training via Asymmetric Attention Fusion.

[BibT_eX]

[DOI]

Remote. Sens., December, 2023

Convolution-Enhanced Evolving Attention Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Improving Video Instance Segmentation via Temporal Pyramid Routing.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection.

[BibT_eX]

[DOI]

CoRR, 2023

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM.

[BibT_eX]

[DOI]

CoRR, 2023

Effective Adapter for Face Recognition in the Wild.

[BibT_eX]

[DOI]

CoRR, 2023

Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion.

[BibT_eX]

[DOI]

CoRR, 2023

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Neural Collapse Terminus: A Unified Solution for Class Incremental Learning and Its Variants.

[BibT_eX]

[DOI]

CoRR, 2023

Change Detection Methods for Remote Sensing in the Last Decade: A Comprehensive Review.

[BibT_eX]

[DOI]

CoRR, 2023

Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Rethinking Mobile Block for Efficient Neural Models.

[BibT_eX]

[DOI]

CoRR, 2023

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

4D Panoptic Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Explore In-Context Learning for 3D Point Cloud Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Rethinking Mobile Block for Efficient Attention-based Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Panoptic Video Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Towards Robust Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2022

SFNet: Faster, Accurate, and Domain Agnostic Semantic Segmentation via Semantic Flow.

[BibT_eX]

[DOI]

CoRR, 2022

Multi-Task Learning with Multi-query Transformer for Dense Prediction.

[BibT_eX]

[DOI]

CoRR, 2022

Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

[BibT_eX]

[DOI]

CoRR, 2022

Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Query Learning of Both Thing and Stuff for Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

PolyphonicFormer: Unified Query Learning for Depth-Aware Video Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Global Aggregation Then Local Distribution for Scene Parsing.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

Towards Efficient Scene Understanding via Squeeze Reasoning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

Improving Video Instance Segmentation via Temporal Pyramid Routing.

[BibT_eX]

[DOI]

CoRR, 2021

BoundarySqueeze: Image Segmentation as Boundary Squeezing.

[BibT_eX]

[DOI]

CoRR, 2021

End-to-End Video Object Detection with Spatial-Temporal Transformers.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Fast and Accurate Scene Parsing via Bi-Direction Alignment Networks.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Dynamic Dual Sampling Module For Fine-Grained Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Enhanced Boundary Learning for Glass-like Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Involution: Inverting the Inherence of Convolution for Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Semantic Flow for Fast and Accurate Scene Parsing.

[BibT_eX]

[DOI]

CoRR, 2020

Semantic Flow for Fast and Accurate Scene Parsing.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Improving Semantic Segmentation via Decoupled Body and Edge Supervision.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Gated Fully Fusion for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

GFF: Gated Fully Fusion for Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2019

Flow2Seg: Motion-Aided Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2019: Image Processing, 2019

Dual Graph Convolutional Network for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 30th British Machine Vision Conference 2019, 2019

Global Aggregation then Local Distribution in Fully Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the 30th British Machine Vision Conference 2019, 2019

Xiangtai Li

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...