Zhaoyang Zhang

Affiliations:
  • Wuhan University, China
  • SenseTime Research, Wuhan, China


According to our database1, Zhaoyang Zhang authored at least 32 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
IC-Custom: Diverse Image Customization via In-Context Learning.
CoRR, July, 2025

HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation.
CoRR, June, 2025

FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios.
CoRR, May, 2025

Cobra: Efficient Line Art COlorization with BRoAder References.
CoRR, April, 2025

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing.
CoRR, March, 2025

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control.
CoRR, March, 2025

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Image Conductor: Precision Control for Interactive Video Synthesis.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
FAT: Frequency-Aware Transformation for Bridging Full-Precision and Low-Precision Deep Representations.
IEEE Trans. Neural Networks Learn. Syst., February, 2024

Consistent Human Image and Video Generation with Spatially Conditioned Diffusion.
CoRR, 2024

ColorFlow: Retrieval-Augmented Image Sequence Colorization.
CoRR, 2024

BrushEdit: All-In-One Image Inpainting and Editing.
CoRR, 2024

Adding Multi-modal Controls to Whole-body Human Motion Generation.
CoRR, 2024

Image Inpainting Models are Effective Tools for Instruction-guided Image Editing.
CoRR, 2024

ReVideo: Remake a Video with Motion and Content Control.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Cached Transformers: Improving Transformers with Differentiable Memory Cachde.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Cached Transformers: Improving Transformers with Differentiable Memory Cache.
CoRR, 2023

Real-Time Controllable Denoising for Image and Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Dynamic Token Normalization improves Vision Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Dynamic Token Normalization Improves Vision Transformer.
CoRR, 2021

BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening.
CoRR, 2021

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation.
CoRR, 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution.
Proceedings of the 38th International Conference on Machine Learning, 2021

STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory.
CoRR, 2020

2019
Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018
Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Boosting up Scene Text Detectors with Guided CNN.
Proceedings of the British Machine Vision Conference 2018, 2018


  Loading...