Xihui Liu

Orcid: 0000-0003-1831-9952

According to our database1, Xihui Liu authored at least 114 papers between 2016 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook.
ACM Comput. Surv., November, 2025

GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation.
CoRR, August, 2025

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation.
CoRR, July, 2025

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding.
CoRR, July, 2025

OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion.
CoRR, July, 2025

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling.
CoRR, July, 2025

UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation.
CoRR, July, 2025

DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation.
CoRR, July, 2025

FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation.
CoRR, June, 2025

DreamCube: 3D Panorama Generation via Multi-plane Synchronization.
CoRR, June, 2025

GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning.
CoRR, June, 2025

Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval.
CoRR, June, 2025

AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation.
CoRR, June, 2025

T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2025

Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation.
CoRR, May, 2025

GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning.
CoRR, May, 2025

Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations.
CoRR, May, 2025

A Survey of Interactive Generative Video.
CoRR, April, 2025

Personalized Text-to-Image Generation with Auto-Regressive Models.
CoRR, April, 2025

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation.
CoRR, April, 2025

HoloPart: Generative 3D Part Amodal Segmentation.
CoRR, April, 2025

Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1.
CoRR, March, 2025

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset.
CoRR, March, 2025

Position: Interactive Generative Video as Next-Generation Game Engine.
CoRR, March, 2025

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation.
CoRR, March, 2025

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints.
CoRR, March, 2025

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing.
CoRR, March, 2025

Exploring Representation-Aligned Latent Space for Better Generation.
CoRR, February, 2025

LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation.
CoRR, January, 2025

GameFactory: Creating New Games with Generative Interactive Videos.
CoRR, January, 2025

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Parallelized Autoregressive Visual Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MBQ: Modality-Balanced Quantization for Large Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding.
CoRR, 2024

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios.
CoRR, 2024

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation.
CoRR, 2024

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration.
CoRR, 2024

SAMPart3D: Segment Any Part in 3D Objects.
CoRR, 2024

WorldSimBench: Towards Video Generation Models as World Simulators.
CoRR, 2024

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation.
CoRR, 2024

Loong: Generating Minute-level Long Videos with Autoregressive Language Models.
CoRR, 2024

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness.
CoRR, 2024

DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion.
CoRR, 2024

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation.
CoRR, 2024

OVExp: Open Vocabulary Exploration for Object-Oriented Navigation.
CoRR, 2024

Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images.
CoRR, 2024

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis.
CoRR, 2024

Editing Massive Concepts in Text-to-Image Diffusion Models.
CoRR, 2024

Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation.
CoRR, 2024

Shape-Guided Diffusion with Inside-Outside Attention.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

4Diffusion: Multi-view Video Diffusion Model for 4D Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

LVD-2M: A Long-take Video Dataset with Temporally Dense Captions.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

BEACON: Benchmark for Comprehensive RNA Tasks and Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

FiT: Flexible Vision Transformer for Diffusion Model.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities.
Proceedings of the Computer Vision - ECCV 2024, 2024

PredBench: Benchmarking Spatio-Temporal Prediction Across Diverse Disciplines.
Proceedings of the Computer Vision - ECCV 2024, 2024

TC4D: Trajectory-Conditioned Text-to-4D Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Large-Scale 3D Representation Learning with Multi-Dataset Point Prompt Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Point Transformer V3: Simpler, Faster, Stronger.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
A Survey of Reasoning with Foundation Models.
CoRR, 2023

EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models.
CoRR, 2023

Drag-A-Video: Non-rigid Video Editing with Point-based Interaction.
CoRR, 2023

Understanding Masked Autoencoders From a Local Contrastive Perspective.
CoRR, 2023

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection.
CoRR, 2023

UniG3D: A Unified 3D Object Generation Dataset.
CoRR, 2023

SAM3D: Segment Anything in 3D Scenes.
CoRR, 2023

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale.
CoRR, 2023

Seeing is not always believing: A Quantitative Study on Human Perception of AI-Generated Images.
CoRR, 2023

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer.
CoRR, 2023

More Control for Free! Image Synthesis with Semantic Diffusion Guidance.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

OV-PARTS: Towards Open-Vocabulary Part Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CorresNeRF: Image Correspondence Priors for Neural Radiance Fields.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DDP: Diffusion Model for Dense Visual Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GLeaD: Improving GANs with A Generator-Leading Task.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Internal Tides and Their Intraseasonal Variability on the Continental Slope Northeast of Taiwan Island Derived from Mooring Observations and Satellite Data.
Remote. Sens., 2022

Back to the Source: Diffusion-Driven Test-Time Adaptation.
CoRR, 2022

The ArtBench Dataset: Benchmarking Generative Models with Artworks.
CoRR, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval.
CoRR, 2022

BridgeFormer: Bridging Video-text Retrieval with Multiple Choice Questions.
CoRR, 2022

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval.
Proceedings of the Computer Vision - ECCV 2022, 2022

Bridging Video-text Retrieval with Multiple Choice Questions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Benchmark for Compositional Text-to-Image Synthesis.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Nonlinear dynamics analysis of involute spur gear transmission system.
Proceedings of the AIAM 2021: 3rd International Conference on Artificial Intelligence and Advanced Manufacture, Manchester, United Kingdom, October 23, 2021

2020
Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association.
CoRR, 2018

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data.
Proceedings of the Computer Vision - ECCV 2018, 2018

Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association.
Proceedings of the Computer Vision - ECCV 2018, 2018

Localization Guided Learning for Pedestrian Attribute Recognition.
Proceedings of the British Machine Vision Conference 2018, 2018

2017
Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification.
Proceedings of the IEEE International Conference on Computer Vision, 2017

HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Object Detection in Videos with Tubelet Proposal Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
A low-complexity precoding scheme for two-user massive MIMO downlink.
Proceedings of the 17th IEEE International Workshop on Signal Processing Advances in Wireless Communications, 2016

Measurement-Driven Capability Modeling for Mobile Network in Large-Scale Urban Environment.
Proceedings of the 13th IEEE International Conference on Mobile Ad Hoc and Sensor Systems, 2016


  Loading...