Yiyi Zhou

Orcid: 0000-0002-5110-4526

According to our database¹, Yiyi Zhou authored at least 96 papers between 2015 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Towards Parameter-Efficient Network Pruning with Re-Parameterized Adapter.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2026

Not All Attention is Needed: Parameter and Computation Efficient Tuning for Multi-modal Large Language Models via Effective Attention Skipping.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., March, 2026

Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism.

[BibT_eX]

[DOI]

CoRR, March, 2026

ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling.

[BibT_eX]

[DOI]

CoRR, March, 2026

DeepInv: A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion.

[BibT_eX]

[DOI]

CoRR, January, 2026

Domain incremental learning for object detection.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Graph-empowered Text-to-SQL generation on Electronic Medical Records.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Intelligent flame monitoring system on the robotic dog platform.

[BibT_eX]

[DOI]

Proceedings of the 2026 International Conference on Communication Networks and Machine Learning (CNML), Chongqing, China, January 30, 2026

Vision-language Incremental Learning with Dual Class-individual Memory.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval.

[BibT_eX]

[DOI]

CoRR, December, 2025

Omni-Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, December, 2025

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2025

MoIL: Momentum Imitation Learning for Efficient Vision-Language Adaptation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

Image Captioning via Dynamic Path Customization.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., April, 2025

CycleTrans: Learning Neutral Yet Discriminative Features via Cycle Construction for Visible- Infrared Person Re-Identification.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., March, 2025

Grounded Chain-of-Thought for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection.

[BibT_eX]

[DOI]

CoRR, February, 2025

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy.

[BibT_eX]

[DOI]

CoRR, February, 2025

Secure Service Function Chain Provisioning for Task Offloading in Device-Edge-Cloud Computing.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2025

M3ixup: A multi-modal data augmentation approach for image captioning.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

Optical remote sensing image salient object detection via bidirectional cross-attention and attention restoration.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

Offshore Horizons: HVDC Wind Farms-Exploring Techno-Economic Dimensions.

[BibT_eX]

[DOI]

IEEE Access, 2025

DDoS Attack Detection in SDN-Assisted Federated Learning Environment Based on Contrastive Learning.

[BibT_eX]

[DOI]

IEEE Access, 2025

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SVFR: A Unified Framework for Generalized Video Face Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Towards Language-Guided Visual Recognition via Dynamic Convolutions.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., January, 2024

A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Deep hybrid transformer network for robust modulation classification in wireless communications.

[BibT_eX]

[DOI]

Knowl. Based Syst., 2024

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings.

[BibT_eX]

[DOI]

CoRR, 2024

Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Deep Instruction Tuning for Segment Anything Model.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Adapting Pre-trained Language Models to Vision-Language Tasksvia Dynamic Visual Prompting.

[BibT_eX]

[DOI]

Shubin Huang

Qiong Wu

Yiyi Zhou

Proceedings of the International Joint Conference on Neural Networks, 2024

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Towards Omni-supervised Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Towards local visual modeling for image captioning.

[BibT_eX]

[DOI]

Pattern Recognit., June, 2023

A Real-Time Global Inference Network for One-Stage Referring Expression Comprehension.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., 2023

Knowing What it is: Semantic-Enhanced Dual Attention Transformer.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning.

[BibT_eX]

[DOI]

CoRR, 2023

M3PS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization in E-commerce.

[BibT_eX]

[DOI]

CoRR, 2023

Approximated Prompt Tuning for Vision-Language Pre-trained Models.

[BibT_eX]

[DOI]

CoRR, 2023

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting.

[BibT_eX]

[DOI]

CoRR, 2023

Towards End-to-end Semi-supervised Learning for One-stage Object Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Efficient Visual Adaption via Structural Re-parameterization.

[BibT_eX]

[DOI]

CoRR, 2023

HSM-QA: Question Answering System Based on Hierarchical Semantic Matching.

[BibT_eX]

[DOI]

IEEE Access, 2023

Semantic-Guided Selective Representation for Image Captioning.

[BibT_eX]

[DOI]

IEEE Access, 2023

Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

A three-circle triangle model of Bearing-Only Passive Locating of the UAVs.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Information Systems Engineering, 2023

RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Plenty is Plague: Fine-Grained Learning for Visual Question Answering.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification.

[BibT_eX]

[DOI]

CoRR, 2022

What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study.

[BibT_eX]

[DOI]

CoRR, 2022

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation.

[BibT_eX]

[DOI]

CoRR, 2022

What Hinders Perceptual Quality of PSNR-oriented Methods?

[BibT_eX]

[DOI]

CoRR, 2022

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Open-Ended Text-to-Face Generation, Combination and Manipulation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Learning Dynamic Prior Knowledge for Text-to-Face Pixel Synthesis.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SeqTR: A Simple Yet Universal Network for Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

DIFNet: Boosting Visual Information Flow for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Active Teacher for Semi-Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Uncovering Media Bias via Social Network Learning.

[BibT_eX]

[DOI]

ACM Trans. Intell. Syst. Technol., 2021

Towards Language-guided Visual Recognition via Dynamic Convolutions.

[BibT_eX]

[DOI]

CoRR, 2021

TRAR: Routing the Attention Spans in Transformer for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Consumer Search and Automobile Dealer Colocation.

[BibT_eX]

[DOI]

Charles Murry

Yiyi Zhou

Manag. Sci., 2020

K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Cascade Grouped Attention Network for Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Social Media Based Topic Modeling for Smart Campus: A Deep Topical Correlation Analysis Method.

[BibT_eX]

[DOI]

IEEE Access, 2019

Towards Cross-modality Topic Modelling via Deep Topical Correlation Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Dynamic Capsule Attention for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2017

Bayesian Estimation of a Dynamic Model of Two-Sided Markets: Application to the U.S. Video Game Industry.

[BibT_eX]

[DOI]

Yiyi Zhou

Manag. Sci., 2017

More Than An Answer: Neural Pivot Network for Visual Qestion Answering.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

2016

Survey of visual sentiment prediction for social media analysis.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2016

2015

Design of Personalized News Comments Recommendation System.

[BibT_eX]

[DOI]

Proceedings of the Data Science - Second International Conference, 2015

Yiyi Zhou

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...