We stand with Ukraine

We stand with Ukraine

Yu Liu

Orcid: 0000-0001-8071-3745

Affiliations:

Alibaba Group, Machine Intelligence Technology Lab

According to our database¹, Yu Liu authored at least 55 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2025

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

Addressing the ID-Matching Challenge in Long Video Captioning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, October, 2025

DiffCamera: Arbitrary Refocusing on Images.

[BibT_eX]

[DOI]

,

,

,

,

Hengshuang Zhao

CoRR, September, 2025

AnyDoor: Zero-Shot Image Customization With Region-to-Region Reference.

[BibT_eX]

[DOI]

,

,

,

,

,

Hengshuang Zhao

IEEE Trans. Pattern Anal. Mach. Intell., August, 2025

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, June, 2025

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Hengshuang Zhao

CoRR, June, 2025

Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, June, 2025

AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, June, 2025

ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, March, 2025

VACE: All-in-One Video Creation and Editing.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, March, 2025

DiffDoctor: Diagnosing Image Diffusion Models Before Treating.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Hengshuang Zhao

CoRR, January, 2025

VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, January, 2025

ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, January, 2025

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Improved Video VAE for Latent Video Diffusion Model.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MangaNinja: Line Art Colorization with Precise Reference Following.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

IDEA-Bench: How Far are Generative Models from Professional Designing?

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

IDEA-Bench: How Far are Generative Models from Professional Designing?

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

In-Context LoRA for Diffusion Transformers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Group Diffusion Transformers are Unsupervised Multitask Learners.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

Zero-shot Image Editing with Reference Imitation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Hengshuang Zhao

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Lipschitz Singularities in Diffusion Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

DreamClean: Restoring Clean Image Using Deep Diffusion Prior.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Exploring Guided Sampling of Conditional GANs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

LivePhoto: Real Image Animation with Text-Guided Motion Control.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Hengshuang Zhao

Proceedings of the Computer Vision - ECCV 2024, 2024

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text- to- Image Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AnyDoor: Zero-shot Object-level Image Customization.

[BibT_eX]

[DOI]

,

,

,

,

,

Hengshuang Zhao

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Dream Video: Composing Your Dream Videos with Customized Subject and Motion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

VideoLCM: Video Latent Consistency Model.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2023

CCM: Adding Conditional Controls to Text-to-Image Consistency Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2023

Eliminating Lipschitz Singularities in Diffusion Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2023

Cones 2: Customizable Image Synthesis with Multiple Subjects.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2023

Customizable Image Synthesis with Multiple Subjects.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cones: Concept Neurons in Diffusion Models for Customized Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

Composer: Creative and Controllable Image Synthesis with Composable Conditions.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Dimensionality-Varying Diffusion Process.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Animating Images to Transfer CLIP for Video-Text Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constraints.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Computer Vision - ECCV 2022, 2022

DiffGAR: Model-Agnostic Restoration from Generative Artifacts Using Image-to-Image Diffusion Models.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 6th International Conference on Computer Science and Artificial Intelligence, 2022

A Trend-Driven Fashion Design System for Rapid Response Marketing in E-commerce.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Once and for All: Self-supervised Multi-modal Co-training on One-billion Videos at Alibaba.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Communication Efficient SGD via Gradient Sampling With Bayes Prior.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Self-Supervised Video Representation Learning by Context and Motion Decoupling.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Train a One-Million-Way Instance Classifier for Unsupervised Visual Representation Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Loading...