Zhaoyang Zhang

Orcid: 0009-0003-5583-6454

Affiliations:
  • Wuhan University, China
  • SenseTime Research, Wuhan, China


According to our database1, Zhaoyang Zhang authored at least 35 papers between 2018 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
iScript: A Domain-Adapted Large Language Model and Benchmark for Physical Design Tcl Script Generation.
CoRR, March, 2026

2025
IC-Custom: Diverse Image Customization via In-Context Learning.
CoRR, July, 2025

HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation.
CoRR, June, 2025

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing.
CoRR, March, 2025

BlobCtrl: Taming Controllable Blob for Element-level Image Editing.
Proceedings of the SIGGRAPH Asia 2025 Conference Papers, 2025

Cobra: Efficient Line Art COlorization with BRoAder References.
Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios.
Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control.
Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Image Conductor: Precision Control for Interactive Video Synthesis.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
FAT: Frequency-Aware Transformation for Bridging Full-Precision and Low-Precision Deep Representations.
IEEE Trans. Neural Networks Learn. Syst., February, 2024

Consistent Human Image and Video Generation with Spatially Conditioned Diffusion.
CoRR, 2024

ColorFlow: Retrieval-Augmented Image Sequence Colorization.
CoRR, 2024

BrushEdit: All-In-One Image Inpainting and Editing.
CoRR, 2024

Adding Multi-modal Controls to Whole-body Human Motion Generation.
CoRR, 2024

Image Inpainting Models are Effective Tools for Instruction-guided Image Editing.
CoRR, 2024

ReVideo: Remake a Video with Motion and Content Control.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Cached Transformers: Improving Transformers with Differentiable Memory Cachde.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Cached Transformers: Improving Transformers with Differentiable Memory Cache.
CoRR, 2023

Real-Time Controllable Denoising for Image and Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Dynamic Token Normalization improves Vision Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Dynamic Token Normalization Improves Vision Transformer.
CoRR, 2021

BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening.
CoRR, 2021

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation.
CoRR, 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution.
Proceedings of the 38th International Conference on Machine Learning, 2021

STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory.
CoRR, 2020

2019
Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018
Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Boosting up Scene Text Detectors with Guided CNN.
Proceedings of the British Machine Vision Conference 2018, 2018


  Loading...