Shiwei Zhang

Orcid: 0000-0002-6929-5295

Affiliations:
  • Alibaba Group, Hangzhou, China
  • Huazhong University of Science and Technology, School of Artificial Intelligence and Automation, Wuhan, China (PhD 2019)


According to our database1, Shiwei Zhang authored at least 68 papers between 2015 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation.
CoRR, June, 2025

UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer.
CoRR, April, 2025

Taming Consistency Distillation for Accelerated Human Image Animation.
CoRR, April, 2025

DreamRelation: Relation-Centric Video Customization.
CoRR, March, 2025

Animate-X: Universal Character Image Animation with Enhanced Motion Representation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
CLIP-guided Prototype Modulating for Few-shot Action Recognition.
Int. J. Comput. Vis., June, 2024

HyRSM++: Hybrid relation guided temporal set matching for few-shot action recognition.
Pattern Recognit., March, 2024

CMDFusion: Bidirectional Fusion Network With Cross-Modality Knowledge Distillation for LiDAR Semantic Segmentation.
IEEE Robotics Autom. Lett., January, 2024

MAR: Masked Autoencoders for Efficient Action Recognition.
IEEE Trans. Multim., 2024

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion.
CoRR, 2024

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation.
CoRR, 2024

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control.
CoRR, 2024

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
CoRR, 2024

UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation.
CoRR, 2024

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InstructVideo: Instructing Video Diffusion Models with Human Feedback.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Hierarchical Spatio-temporal Decoupling for Text-to- Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text- to- Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Dream Video: Composing Your Dream Videos with Customized Subject and Motion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Towards Real-World Visual Tracking With Temporal Contexts.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Self-Supervised Learning from Untrimmed Videos via Hierarchical Consistency.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Cross-domain few-shot action recognition with unlabeled videos.
Comput. Vis. Image Underst., August, 2023

ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning.
IEEE Trans. Multim., 2023

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models.
CoRR, 2023

VideoLCM: Video Latent Consistency Model.
CoRR, 2023

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion.
CoRR, 2023

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.
CoRR, 2023

Few-shot Action Recognition with Captioning Foundation Models.
CoRR, 2023

ModelScope Text-to-Video Technical Report.
CoRR, 2023

Temporally-Adaptive Models for Efficient Video Understanding.
CoRR, 2023

FaceComposer: A Unified Model for Versatile Facial Content Creation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VideoComposer: Compositional Video Synthesis with Motion Controllability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

RLIPv2: Fast Scaling of Relational Language-Image Pre-training.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Space-time Prompting for Video Class-incremental Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Exploring Language Hierarchy for Video Grounding.
IEEE Trans. Image Process., 2022

Context-aware Proposal Network for Temporal Action Detection.
CoRR, 2022

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

TAda! Temporally-Adaptive Convolutions for Video Understanding.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Open-world Semantic Segmentation for LIDAR Point Clouds.
Proceedings of the Computer Vision - ECCV 2022, 2022

Hybrid Relation Guided Set Matching for Few-shot Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

TCTrack: Temporal Contexts for Aerial Tracking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Exploring Stronger Feature for Temporal Action Localization.
CoRR, 2021

Proposal Relation Network for Temporal Action Detection.
CoRR, 2021

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling.
CoRR, 2021

Relation Modeling in Spatio-Temporal Action Localization.
CoRR, 2021

A Stronger Baseline for Ego-Centric Action Detection.
CoRR, 2021

Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition.
CoRR, 2021

OadTR: Online Action Detection with Transformers.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Support-Set Based Cross-Supervision for Video Grounding.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Self-Supervised Learning for Semi-Supervised Temporal Action Proposal.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Self-Supervised Motion Learning From Static Images.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
GLNet: Global Local Network for Weakly Supervised Action Localization.
IEEE Trans. Multim., 2020

CBR-Net: Cascade Boundary Refinement Network for Action Detection: Submission to ActivityNet Challenge 2020 (Task 1).
CoRR, 2020

Temporal Fusion Network for Temporal Action Localization: Submission to ActivityNet Challenge 2020 (Task E).
CoRR, 2020

Multi-level Temporal Pyramid Network for Action Detection.
Proceedings of the Pattern Recognition and Computer Vision - Third Chinese Conference, 2020

2019
TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Discriminative Part Selection for Human Action Recognition.
IEEE Trans. Multim., 2018

2017
Group Sparse-Based Mid-Level Representation for Action Recognition.
IEEE Trans. Syst. Man Cybern. Syst., 2017

2015
Mid-level parts mined by feature selection for action recognition.
Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition, 2015


  Loading...