Xintao Wang

Orcid: 0000-0001-6585-8604

Affiliations:
  • Tencent AI Lab., Tencent PCG, Applied Research Center (ARC), Shenzhen, China
  • Chinese University of Hong Kong (CUHK), SenseTime Joint Lab, Department of Information Engineering, Hong Kong (PhD 2020)


According to our database1, Xintao Wang authored at least 130 papers between 2017 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution.
CoRR, June, 2025

UNIC: Unified In-Context Video Editing.
CoRR, June, 2025

FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers.
CoRR, June, 2025

CamCloneMaster: Enabling Reference-based Camera Control for Video Generation.
CoRR, June, 2025

Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control.
CoRR, June, 2025

Scaling Image and Video Generation via Test-Time Evolutionary Search.
CoRR, May, 2025

Flow-GRPO: Training Flow Matching Models via Online RL.
CoRR, May, 2025

StyleAdapter: A Unified Stylized Image Generation Model.
Int. J. Comput. Vis., April, 2025

A Survey of Interactive Generative Video.
CoRR, April, 2025

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation.
CoRR, March, 2025

FullDiT: Multi-Task Video Generative Foundation Model with Full Attention.
CoRR, March, 2025

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers.
CoRR, March, 2025

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video.
CoRR, March, 2025

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance.
IEEE Trans. Vis. Comput. Graph., February, 2025

CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation.
CoRR, February, 2025

Improving Video Generation with Human Feedback.
CoRR, January, 2025

ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning.
CoRR, January, 2025

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Image Conductor: Precision Control for Interactive Video Synthesis.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
ToonCrafter: Generative Cartoon Interpolation.
ACM Trans. Graph., December, 2024

StyleCrafter: Taming Artistic Video Diffusion with Reference-Augmented Adapter Learning.
ACM Trans. Graph., December, 2024

Empowering Real-World Image Super-Resolution With Flexible Interactive Modulation.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2024

Temporally consistent video colorization with deep feature propagation and self-regularization learning.
Comput. Vis. Media, April, 2024

Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos.
IEEE Trans. Image Process., 2024

Consistent Human Image and Video Generation with Spatially Conditioned Diffusion.
CoRR, 2024

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints.
CoRR, 2024

NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model.
CoRR, 2024

Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models.
CoRR, 2024

VideoTetris: Towards Compositional Text-to-Video Generation.
CoRR, 2024

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models.
CoRR, 2024

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model.
CoRR, 2024

Towards A Better Metric for Text-to-Video Generation.
CoRR, 2024

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

VideoTetris: Towards Compositional Text-to-Video Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ReVideo: Remake a Video with Motion and Content Control.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

CustomNet: Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Unifying Image Processing as Visual Prompting Question Answering.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Making LLaMA SEE and Draw with SEED Tokenizer.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

DynamiCrafter: Animating Open-Domain Images with Video Diffusion Priors.
Proceedings of the Computer Vision - ECCV 2024, 2024

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion.
Proceedings of the Computer Vision - ECCV 2024, 2024

Storytelling Video Generation with Retrieval Augmentation and Character Consistency.
Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation.
Proceedings of the Computer Vision - ECCV 2024, 2024

DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment.
Proceedings of the Computer Vision - ECCV 2024, 2024

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

X- Adapter: Universal Compatibility of Plugins for Upgraded Diffusion Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-Based Image Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SmartEdit: Exploring Complex Instruction-Based Image Editing with Multimodal Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Reference-Based Image and Video Super-Resolution via $C^{2}$-Matching.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

GLEAN: Generative Latent Bank for Image Super-Resolution and Beyond.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators.
CoRR, 2023

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation.
CoRR, 2023

MagicStick: Controllable Video Editing via Control Handle Transformations.
CoRR, 2023

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model.
CoRR, 2023

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter.
CoRR, 2023

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models.
CoRR, 2023

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation.
CoRR, 2023

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors.
CoRR, 2023

HAT: Hybrid Attention Transformer for Image Restoration.
CoRR, 2023

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation.
CoRR, 2023

GET3D-: Learning GET3D from Unconstrained Image Collections.
CoRR, 2023

Planting a SEED of Vision in Large Language Model.
CoRR, 2023

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation.
CoRR, 2023

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals.
CoRR, 2023

InstructP2P: Learning to Edit 3D Point Clouds with Text Instructions.
CoRR, 2023

TaleCrafter: Interactive Story Visualization with Multiple Characters.
CoRR, 2023

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos.
CoRR, 2023

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.
CoRR, 2023

Interactive Story Visualization with Multiple Characters.
Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

Inserting Anybody in Diffusion Models via Celeb Basis.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models.
Proceedings of the International Conference on Machine Learning, 2023

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

OSRT: Omnidirectional Image Super-Resolution with Distortion-aware Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Activating More Pixels in Image Super-Resolution Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023


Mitigating Artifacts in Real-World Video Super-resolution Models.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Accelerating the Training of Video Super-resolution Models.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Path-Restore: Learning Network Path Selection for Image Restoration.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Hybrid Warping Fusion for Video Frame Interpolation.
Int. J. Comput. Vis., 2022

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.
CoRR, 2022

Reference-based Image and Video Super-Resolution via C2-Matching.
CoRR, 2022

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis.
CoRR, 2022

FaceFormer: Scale-aware Blind Face Restoration with Transformers.
CoRR, 2022

Activating More Pixels in Image Super-Resolution Transformer.
CoRR, 2022

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results.
CoRR, 2022

AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Rethinking Alignment in Video Super-Resolution Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Composite Photograph Harmonization with Complete Background Cues.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Metric Learning Based Interactive Modulation for Real-World Super-Resolution.
Proceedings of the Computer Vision - ECCV 2022, 2022

VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder.
Proceedings of the Computer Vision - ECCV 2022, 2022


VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022


2021
Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Towards Vivid and Diverse Image Colorization with Generative Color Prior.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Positional Encoding As Spatial Inductive Bias in GANs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Towards Real-World Blind Face Restoration With Generative Facial Prior.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Robust Reference-Based Super-Resolution via C2-Matching.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Understanding Deformable Alignment in Video Super-Resolution.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2019
Deep Network Interpolation for Continuous Imagery Effect Transition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

EDVR: Video Restoration With Enhanced Deformable Convolutional Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019



2018
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.
CoRR, 2018

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017


  Loading...