We stand with Ukraine

We stand with Ukraine

Zhongang Qi

Orcid: 0000-0001-8298-4063

According to our database¹, Zhongang Qi authored at least 74 papers between 2011 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2025

UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, August, 2025

StyleAdapter: A Unified Stylized Image Generation Model.

[DOI]

,

,

,

,

,

,

Int. J. Comput. Vis., April, 2025

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation.

[DOI]

,

,

,

,

,

,

,

CoRR, March, 2025

Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning.

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., January, 2025

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning.

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Taming Rectified Flow for Inversion and Editing.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

DOGR: Towards Versatile Visual Document Grounding and Referring.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Mamba-3VL: Taming State Space Model for 3D Vision Language Learning.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Less is More: Empowering GUI Agent with Context-Aware Simplification.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities.

[DOI]

,

,

,

,

Guangcong Zheng

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation.

[DOI]

,

,

,

,

,

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., April, 2024

DropConn: Dropout Connection Based Random GNNs for Molecular Property Prediction.

[DOI]

,

,

,

,

,

IEEE Trans. Knowl. Data Eng., February, 2024

Chinese Title Generation for Short Videos: Dataset, Metric and Algorithm.

[DOI]

,

,

,

,

,

,

,

,

,

,

Stephen J. Maybank

IEEE Trans. Pattern Anal. Mach. Intell., 2024

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models.

[DOI]

,

,

,

,

,

,

Guangcong Zheng

,

,

CoRR, 2024

DOGE: Towards Versatile Visual Document Grounding and Referring.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

mR<sup>2</sup>AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM.

[DOI]

,

,

,

,

,

CoRR, 2024

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model.

[DOI]

,

,

,

,

,

,

CoRR, 2024

RecDCL: Dual Contrastive Learning for Recommendation.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the ACM on Web Conference 2024, 2024

E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

CustomNet: Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models.

[DOI]

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

EA-VTR: Event-Aware Video-Text Retrieval.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding.

[DOI]

,

,

,

,

Ming-Ming Cheng

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model.

[DOI]

,

,

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models.

[DOI]

,

,

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Task-Aware Dual-Representation Network for Few-Shot Action Recognition.

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., October, 2023

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models.

[DOI]

,

,

,

,

,

CoRR, 2023

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation.

[DOI]

,

,

,

,

,

,

CoRR, 2023

Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation.

[DOI]

,

,

,

,

,

CoRR, 2023

Sticker820K: Empowering Interactive Retrieval with Stickers.

[DOI]

,

,

,

,

,

,

CoRR, 2023

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.

[DOI]

,

,

,

,

,

,

CoRR, 2023

Exploiting Contextual Objects and Relations for 3D Visual Grounding.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VTLayout: A Multi-Modal Approach for Video Text Layout.

[DOI]

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Toward Human Perception-Centric Video Thumbnail Generation.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Do We Really Need Temporal Convolutions in Action Segmentation?

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Order-Prompted Tag Sequence Generation for Video Tagging.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation.

[DOI]

Guangcong Zheng

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ViLEM: Visual-Language Error Modeling for Image-Text Retrieval.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Accelerating the Training of Video Super-resolution Models.

[DOI]

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Weakly-supervised Action Localization via Hierarchical Mining.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Efficient U-Transformer with Boundary-Aware Loss for Action Segmentation.

[DOI]

,

,

,

,

,

CoRR, 2022

CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Convolutional Transformer with Similarity-based Boundary Prediction for Action Segmentation.

[DOI]

,

,

,

,

,

Proceedings of the 34th IEEE International Conference on Tools with Artificial Intelligence, 2022

BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

From Heatmaps to Structural Explanations of Image Classifiers.

[DOI]

,

,

,

Vivswan Shitole

,

Prasad Tadepalli

,

,

CoRR, 2021

Stochastic Block-ADMM for Training Deep Networks.

[DOI]

,

,

Mohamad H. Danesh

,

,

CoRR, 2021

A Generic Object Re-identification System for Short Videos.

[DOI]

,

,

,

,

,

CoRR, 2021

Embedding deep networks into visual explanations.

[DOI]

,

,

Artif. Intell., 2021

Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Semantic-Guided Relation Propagation Network for Few-shot Action Recognition.

[DOI]

,

,

,

,

,

,

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

TransFusion: Multi-Modal Fusion for Video Tag Inference via Translation-based Knowledge Embedding.

[DOI]

,

,

,

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Open-Book Video Captioning With Retrieve-Copy-Generate Network.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Visualizing point cloud classifiers by curvature smoothing.

[DOI]

,

,

,

Proceedings of the 31st British Machine Vision Conference 2020, 2020

Visualizing Deep Networks by Optimizing with Integrated Gradients.

[DOI]

,

,

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

ScaleNet - Improve CNNs through Recursively Rescaling Objects.

[DOI]

,

,

,

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Interactive Naming for Explaining Deep Neural Networks: A Formative Study.

[DOI]

Mandana Hamidi-Haines

,

,

,

,

Prasad Tadepalli

Proceedings of the Joint Proceedings of the ACM IUI 2019 Workshops co-located with the 24th ACM Conference on Intelligent User Interfaces (ACM IUI 2019), 2019

PointConv: Deep Convolutional Networks on 3D Point Clouds.

[DOI]

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Deep Air Learning: Interpolation, Prediction, and Feature Analysis of Fine-Grained Air Quality.

[DOI]

,

,

,

,

,

IEEE Trans. Knowl. Data Eng., 2018

Multi-Task Medical Concept Normalization Using Multi-View Convolutional Neural Network.

[DOI]

,

,

,

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Embedding Deep Networks into Visual Explanations.

[DOI]

,

CoRR, 2017

2013

Learning with limited and noisy tagging.

[DOI]

,

,

Zhongfei (Mark) Zhang

,

Proceedings of the ACM Multimedia Conference, 2013

Characterizing and Comparing User Location Preference in an Urban Mobile Network.

[DOI]

,

,

,

Proceedings of the Trustworthy Computing and Services, 2013

Bayesian Multi-Task Relationship Learning with Link Structure.

[DOI]

,

,

,

Zhongfei (Mark) Zhang

Proceedings of the 2013 IEEE 13th International Conference on Data Mining, 2013

2012

Multi-view learning from imperfect tagging.

[DOI]

,

,

Zhongfei (Mark) Zhang

,

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Mining noisy tagging from multi-label space.

[DOI]

,

,

Zhongfei (Mark) Zhang

,

Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011

Mining partially annotated images.

[DOI]

,

,

Zhongfei (Mark) Zhang

,

Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011

Loading...