Xiaodong Cun

Orcid: 0000-0003-3607-2236

According to our database¹, Xiaodong Cun authored at least 92 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2026

Seg-Agent: Test-Time Multimodal Reasoning for Training-Free Language-Guided Segmentation.

[BibT_eX]

[DOI]

CoRR, May, 2026

Beyond Text Prompts: Visual-to-Visual Generation as A Unified Paradigm.

[BibT_eX]

[DOI]

CoRR, May, 2026

Make-Your-Anchor+: Temporal Consistent 2D Avatar Generation via Video Diffusion Prior.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., April, 2026

CutClaw: Agentic Hours-Long Video Editing via Music Synchronization.

[BibT_eX]

[DOI]

CoRR, March, 2026

LightCtrl: Training-free Controllable Video Relighting.

[BibT_eX]

[DOI]

CoRR, March, 2026

MLLM-4D: Towards Visual-based Spatial-Temporal Intelligence.

[BibT_eX]

[DOI]

CoRR, March, 2026

Explicit Visual Prompting for Universal Foreground Segmentations.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., February, 2026

Decoupling Vocal and Rhythmic Conditioning for Music-Driven Singing Avatar Animation.

[BibT_eX]

[DOI]

Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

RainbowDreamer: Taming Semantic Controls for Attribute-Consistent Text-to-3D Generation.

[BibT_eX]

[DOI]

Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

2025

EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition.

[BibT_eX]

[DOI]

Yihan Hu

Xuelin Chen

Xiaodong Cun

CoRR, December, 2025

PersonaLive! Expressive Portrait Image Animation for Live Streaming.

[BibT_eX]

[DOI]

CoRR, December, 2025

SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery.

[BibT_eX]

[DOI]

CoRR, November, 2025

GenCompositor: Generative Video Compositing with Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, September, 2025

EmoCAST: Emotional Talking Portrait via Emotive Text Description.

[BibT_eX]

[DOI]

CoRR, August, 2025

GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors.

[BibT_eX]

[DOI]

CoRR, August, 2025

PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation.

[BibT_eX]

[DOI]

CoRR, July, 2025

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Sci-Fi: Symmetric Constraint for Frame Inbetweening.

[BibT_eX]

[DOI]

CoRR, May, 2025

AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse.

[BibT_eX]

[DOI]

CoRR, April, 2025

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing.

[BibT_eX]

[DOI]

CoRR, March, 2025

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., February, 2025

MagicStick: Controllable Video Editing via Control Handle Transformations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2025 Conference Papers, 2025

FairyGen: Storied Cartoon Video from a Single Child-Drawn Character.

[BibT_eX]

[DOI]

Jiayi Zheng

Xiaodong Cun

Proceedings of the SIGGRAPH Asia 2025 Conference Papers, 2025

BlobCtrl: Taming Controllable Blob for Element-level Image Editing.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2025 Conference Papers, 2025

Mobius: Text to Seamless Looping Video Generation via Latent Shift.

[BibT_eX]

[DOI]

Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

DEIM: DETR with Improved Matching for Fast Convergence.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Sketch Video Synthesis.

[BibT_eX]

[DOI]

Comput. Graph. Forum, May, 2024

DH-GAN: Image manipulation localization via a dual homology-aware generative adversarial network.

[BibT_eX]

[DOI]

Weihuang Liu

Xiaodong Cun

Chi-Man Pun

Pattern Recognit., 2024

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

ForgeryTTT: Zero-Shot Image Manipulation Localization with Test-Time Training.

[BibT_eX]

[DOI]

CoRR, 2024

Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach.

[BibT_eX]

[DOI]

CoRR, 2024

StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos.

[BibT_eX]

[DOI]

CoRR, 2024

Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

ZeroPur: Succinct Training-Free Adversarial Purification.

[BibT_eX]

[DOI]

CoRR, 2024

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Towards A Better Metric for Text-to-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

CV-VAE: A Compatible Video VAE for Latent Generative Video Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Noise Calibration: Plug-and-Play Content-Preserving Video Enhancement Using Pre-trained Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Storytelling Video Generation with Retrieval Augmentation and Character Consistency.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

X- Adapter: Universal Compatibility of Plugins for Upgraded Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Depth-Aware Test-Time Training for Zero-Shot Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SmartEdit: Exploring Complex Instruction-Based Image Editing with Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators.

[BibT_eX]

[DOI]

CoRR, 2023

MagicStick: Controllable Video Editing via Control Handle Transformations.

[BibT_eX]

[DOI]

CoRR, 2023

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2023

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation.

[BibT_eX]

[DOI]

CoRR, 2023

TaleCrafter: Interactive Story Visualization with Multiple Characters.

[BibT_eX]

[DOI]

CoRR, 2023

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos.

[BibT_eX]

[DOI]

CoRR, 2023

T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations.

[BibT_eX]

[DOI]

CoRR, 2023

Interactive Story Visualization with Multiple Characters.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

Inserting Anybody in Diffusion Models via Celeb Basis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ToonTalker: Cross-Domain Face Reenactment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Shadocnet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Generating Human Motion from Textual Descriptions with Discrete Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

3D GAN Inversion with Facial Symmetry Prior.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Explicit Visual Prompting for Low-Level Structure Segmentations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Learning Enriched Illuminants for Cross and Single Sensor Color Constancy.

[BibT_eX]

[DOI]

CoRR, 2022

VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2022 Conference Papers, 2022

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Uformer: A General U-Shaped Transformer for Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization.

[BibT_eX]

[DOI]

Jingtang Liang

Xiaodong Cun

Chi-Man Pun

CoRR, 2021

Uformer: A General U-Shaped Transformer for Image Restoration.

[BibT_eX]

[DOI]

CoRR, 2021

Split then Refine: Stacked Attention-guided ResUNets for Blind Single Image Visible Watermark Removal.

[BibT_eX]

[DOI]

Xiaodong Cun

Chi-Man Pun

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Improving the Harmony of the Composite Image by Spatial-Separated Attention Module.

[BibT_eX]

[DOI]

Xiaodong Cun

Chi-Man Pun

IEEE Trans. Image Process., 2020

Defocus Blur Detection via Depth Distillation.

[BibT_eX]

[DOI]

Xiaodong Cun

Chi-Man Pun

Proceedings of the Computer Vision - ECCV 2020, 2020

Towards Ghost-Free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN.

[BibT_eX]

[DOI]

Xiaodong Cun

Chi-Man Pun

Cheng Shi

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2018

Applying stochastic second-order entropy images to multi-modal image registration.

[BibT_eX]

[DOI]

Xiaodong Cun

Chi-Man Pun

Hao Gao

Signal Process. Image Commun., 2018

Depth assisted full resolution network for single image-based view synthesis.

[BibT_eX]

[DOI]

Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2018

Image Splicing Localization via Semi-global Network and Fully Connected Conditional Random Fields.

[BibT_eX]

[DOI]

Xiaodong Cun

Chi-Man Pun

Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Xiaodong Cun

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...