Yingya Zhang

Orcid: 0009-0008-9524-9218

According to our database¹, Yingya Zhang authored at least 68 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation.

[BibT_eX]

[DOI]

CoRR, May, 2026

DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance.

[BibT_eX]

[DOI]

CoRR, December, 2025

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance.

[BibT_eX]

[DOI]

CoRR, October, 2025

Exploiting Discriminative Codebook Prior for Autoregressive Image Generation.

[BibT_eX]

[DOI]

CoRR, August, 2025

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation.

[BibT_eX]

[DOI]

CoRR, July, 2025

Self-Contradiction as Self-Improvement: Mitigating the Generation-Understanding Gap in MLLMs.

[BibT_eX]

[DOI]

CoRR, July, 2025

DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing.

[BibT_eX]

[DOI]

CoRR, June, 2025

UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, April, 2025

Taming Consistency Distillation for Accelerated Human Image Animation.

[BibT_eX]

[DOI]

CoRR, April, 2025

Wan: Open and Advanced Large-Scale Video Generative Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

UniAnimate: taming unified video diffusion models for consistent human image animation.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2025

ZeroPatcher: Training-free Sampler for Video Inpainting and Editing.

[BibT_eX]

[DOI]

Shaoshu Yang

Yingya Zhang

Ran He

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

DreamRelation: Relation-Centric Video Customization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

CLIP-guided Prototype Modulating for Few-shot Action Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., June, 2024

CMDFusion: Bidirectional Fusion Network With Cross-Modality Knowledge Distillation for LiDAR Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., January, 2024

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control.

[BibT_eX]

[DOI]

CoRR, 2024

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Freestyle 3D-Aware Portrait Synthesis Based on Compositional Generative Priors.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition - 27th International Conference, 2024

S<sup>3</sup>D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

InstructVideo: Instructing Video Diffusion Models with Human Feedback.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Hierarchical Spatio-temporal Decoupling for Text-to- Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Dream Video: Composing Your Dream Videos with Customized Subject and Motion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models.

[BibT_eX]

[DOI]

CoRR, 2023

VideoLCM: Video Latent Consistency Model.

[BibT_eX]

[DOI]

CoRR, 2023

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion.

[BibT_eX]

[DOI]

CoRR, 2023

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Few-shot Action Recognition with Captioning Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2023

DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing.

[BibT_eX]

[DOI]

CoRR, 2023

ModelScope Text-to-Video Technical Report.

[BibT_eX]

[DOI]

CoRR, 2023

Temporally-Adaptive Models for Efficient Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation.

[BibT_eX]

[DOI]

CoRR, 2023

FaceComposer: A Unified Model for Versatile Facial Content Creation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VideoComposer: Compositional Video Synthesis with Motion Controllability.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

RLIPv2: Fast Scaling of Relational Language-Image Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Space-time Prompting for Video Class-incremental Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

ANN Softmax: Acceleration of Extreme Classification Training.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2021

ACCL: Architecting Highly Scalable Distributed Training Systems With Highly Efficient Collective Communication Library.

[BibT_eX]

[DOI]

IEEE Micro, 2021

Once and for All: Self-supervised Multi-modal Co-training on One-billion Videos at Alibaba.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Extremely Compact Non-local Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Accelerating Gossip SGD with Periodic Global Averaging.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Communication Efficient SGD via Gradient Sampling With Bayes Prior.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Distribution Adaptive INT8 Quantization for Training CNNs.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Large-Scale Training System for 100-Million Classification at Alibaba.

[BibT_eX]

[DOI]

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

EFLOPS: Algorithm and System Co-Design for a High Performance Distributed Training Platform.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019

Large-Scale Visual Search with Binary Distributed Graph at Alibaba.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

2018

Visual Search at Alibaba.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

2016

Information Theoretic Subspace Clustering.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., 2016

Vehicle trajectory prediction based on Hidden Markov Model.

[BibT_eX]

[DOI]

KSII Trans. Internet Inf. Syst., 2016

A Method for Traffic Congestion Clustering Judgment Based on Grey Relational Analysis.

[BibT_eX]

[DOI]

ISPRS Int. J. Geo Inf., 2016

2015

Robust Subspace Clustering With Complex Noise.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2015

A Method of Vehicle Route Prediction Based on Social Network Analysis.

[BibT_eX]

[DOI]

J. Sensors, 2015

2013

Robust Subspace Clustering via Half-Quadratic Minimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2013

Robust Low-Rank Representation via Correntropy.

[BibT_eX]

[DOI]

Proceedings of the 2nd IAPR Asian Conference on Pattern Recognition, 2013

Yingya Zhang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...