Shiwei Zhang

Orcid: 0000-0002-6929-5295

Affiliations:

Alibaba Group, Hangzhou, China
Huazhong University of Science and Technology, School of Artificial Intelligence and Automation, Wuhan, China (PhD 2019)

According to our database¹, Shiwei Zhang authored at least 70 papers between 2015 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance.

[BibT_eX]

[DOI]

CoRR, October, 2025

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation.

[BibT_eX]

[DOI]

CoRR, July, 2025

SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, April, 2025

Taming Consistency Distillation for Accelerated Human Image Animation.

[BibT_eX]

[DOI]

CoRR, April, 2025

DreamRelation: Relation-Centric Video Customization.

[BibT_eX]

[DOI]

CoRR, March, 2025

UniAnimate: taming unified video diffusion models for consistent human image animation.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2025

Animate-X: Universal Character Image Animation with Enhanced Motion Representation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

CLIP-guided Prototype Modulating for Few-shot Action Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., June, 2024

HyRSM++: Hybrid relation guided temporal set matching for few-shot action recognition.

[BibT_eX]

[DOI]

Pattern Recognit., March, 2024

CMDFusion: Bidirectional Fusion Network With Cross-Modality Knowledge Distillation for LiDAR Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., January, 2024

MAR: Masked Autoencoders for Efficient Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion.

[BibT_eX]

[DOI]

CoRR, 2024

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation.

[BibT_eX]

[DOI]

CoRR, 2024

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control.

[BibT_eX]

[DOI]

CoRR, 2024

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InstructVideo: Instructing Video Diffusion Models with Human Feedback.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Hierarchical Spatio-temporal Decoupling for Text-to- Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text- to- Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Dream Video: Composing Your Dream Videos with Customized Subject and Motion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Towards Real-World Visual Tracking With Temporal Contexts.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Self-Supervised Learning from Untrimmed Videos via Hierarchical Consistency.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Cross-domain few-shot action recognition with unlabeled videos.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., August, 2023

ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models.

[BibT_eX]

[DOI]

CoRR, 2023

VideoLCM: Video Latent Consistency Model.

[BibT_eX]

[DOI]

CoRR, 2023

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion.

[BibT_eX]

[DOI]

CoRR, 2023

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Few-shot Action Recognition with Captioning Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2023

ModelScope Text-to-Video Technical Report.

[BibT_eX]

[DOI]

CoRR, 2023

Temporally-Adaptive Models for Efficient Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

FaceComposer: A Unified Model for Versatile Facial Content Creation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VideoComposer: Compositional Video Synthesis with Motion Controllability.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

RLIPv2: Fast Scaling of Relational Language-Image Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Space-time Prompting for Video Class-incremental Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Exploring Language Hierarchy for Video Grounding.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Context-aware Proposal Network for Temporal Action Detection.

[BibT_eX]

[DOI]

CoRR, 2022

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

TAda! Temporally-Adaptive Convolutions for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Open-world Semantic Segmentation for LIDAR Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Hybrid Relation Guided Set Matching for Few-shot Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

TCTrack: Temporal Contexts for Aerial Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Exploring Stronger Feature for Temporal Action Localization.

[BibT_eX]

[DOI]

CoRR, 2021

Proposal Relation Network for Temporal Action Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Relation Modeling in Spatio-Temporal Action Localization.

[BibT_eX]

[DOI]

CoRR, 2021

A Stronger Baseline for Ego-Centric Action Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

OadTR: Online Action Detection with Transformers.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Support-Set Based Cross-Supervision for Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Self-Supervised Learning for Semi-Supervised Temporal Action Proposal.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Self-Supervised Motion Learning From Static Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

GLNet: Global Local Network for Weakly Supervised Action Localization.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

CBR-Net: Cascade Boundary Refinement Network for Action Detection: Submission to ActivityNet Challenge 2020 (Task 1).

[BibT_eX]

[DOI]

CoRR, 2020

Temporal Fusion Network for Temporal Action Localization: Submission to ActivityNet Challenge 2020 (Task E).

[BibT_eX]

[DOI]

CoRR, 2020

Multi-level Temporal Pyramid Network for Action Detection.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - Third Chinese Conference, 2020

2019

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Discriminative Part Selection for Human Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2018

2017

Group Sparse-Based Mid-Level Representation for Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Syst. Man Cybern. Syst., 2017

2015

Mid-level parts mined by feature selection for action recognition.

[BibT_eX]

[DOI]

Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition, 2015

Shiwei Zhang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...