Yiyi Zhou

Orcid: 0000-0002-5110-4526

According to our database1, Yiyi Zhou authored at least 86 papers between 2015 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2026
Domain incremental learning for object detection.
Pattern Recognit., 2026

Graph-empowered Text-to-SQL generation on Electronic Medical Records.
Pattern Recognit., 2026

2025
MoIL: Momentum Imitation Learning for Efficient Vision-Language Adaptation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

Image Captioning via Dynamic Path Customization.
IEEE Trans. Neural Networks Learn. Syst., April, 2025

CycleTrans: Learning Neutral Yet Discriminative Features via Cycle Construction for Visible- Infrared Person Re-Identification.
IEEE Trans. Neural Networks Learn. Syst., March, 2025

Grounded Chain-of-Thought for Multimodal Large Language Models.
CoRR, March, 2025

AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection.
CoRR, February, 2025

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy.
CoRR, February, 2025

Secure Service Function Chain Provisioning for Task Offloading in Device-Edge-Cloud Computing.
IEEE Trans. Inf. Forensics Secur., 2025

M3ixup: A multi-modal data augmentation approach for image captioning.
Pattern Recognit., 2025

Optical remote sensing image salient object detection via bidirectional cross-attention and attention restoration.
Pattern Recognit., 2025

Offshore Horizons: HVDC Wind Farms-Exploring Techno-Economic Dimensions.
IEEE Access, 2025

DDoS Attack Detection in SDN-Assisted Federated Learning Environment Based on Contrastive Learning.
IEEE Access, 2025

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SVFR: A Unified Framework for Generalized Video Face Restoration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Towards Language-Guided Visual Recognition via Dynamic Convolutions.
Int. J. Comput. Vis., January, 2024

A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension.
IEEE Trans. Multim., 2024

Deep hybrid transformer network for robust modulation classification in wireless communications.
Knowl. Based Syst., 2024

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings.
CoRR, 2024

Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models.
CoRR, 2024

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models.
CoRR, 2024

Deep Instruction Tuning for Segment Anything Model.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Adapting Pre-trained Language Models to Vision-Language Tasksvia Dynamic Visual Prompting.
Proceedings of the International Joint Conference on Neural Networks, 2024

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Towards Omni-supervised Referring Expression Segmentation.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Towards local visual modeling for image captioning.
Pattern Recognit., June, 2023

A Real-Time Global Inference Network for One-Stage Referring Expression Comprehension.
IEEE Trans. Neural Networks Learn. Syst., 2023

Knowing What it is: Semantic-Enhanced Dual Attention Transformer.
IEEE Trans. Multim., 2023

Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning.
IEEE Trans. Multim., 2023

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning.
CoRR, 2023

M3PS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization in E-commerce.
CoRR, 2023

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer.
CoRR, 2023

Approximated Prompt Tuning for Vision-Language Pre-trained Models.
CoRR, 2023

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting.
CoRR, 2023

Towards End-to-end Semi-supervised Learning for One-stage Object Detection.
CoRR, 2023

Towards Efficient Visual Adaption via Structural Re-parameterization.
CoRR, 2023

HSM-QA: Question Answering System Based on Hierarchical Semantic Matching.
IEEE Access, 2023

Semantic-Guided Selective Representation for Image Captioning.
IEEE Access, 2023

Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

A three-circle triangle model of Bearing-Only Passive Locating of the UAVs.
Proceedings of the 8th International Conference on Information Systems Engineering, 2023

RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis.
IEEE Trans. Multim., 2022

Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks.
IEEE Trans. Image Process., 2022

Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image Captioning.
IEEE Trans. Image Process., 2022

Plenty is Plague: Fine-Grained Learning for Visual Question Answering.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification.
CoRR, 2022

What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study.
CoRR, 2022

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation.
CoRR, 2022

What Hinders Perceptual Quality of PSNR-oriented Methods?
CoRR, 2022

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Open-Ended Text-to-Face Generation, Combination and Manipulation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Learning Dynamic Prior Knowledge for Text-to-Face Pixel Synthesis.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SeqTR: A Simple Yet Universal Network for Visual Grounding.
Proceedings of the Computer Vision - ECCV 2022, 2022

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation.
Proceedings of the Computer Vision - ECCV 2022, 2022

DIFNet: Boosting Visual Information Flow for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Active Teacher for Semi-Supervised Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Uncovering Media Bias via Social Network Learning.
ACM Trans. Intell. Syst. Technol., 2021

Towards Language-guided Visual Recognition via Dynamic Convolutions.
CoRR, 2021

TRAR: Routing the Attention Spans in Transformer for Visual Question Answering.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Consumer Search and Automobile Dealer Colocation.
Manag. Sci., 2020

K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Cascade Grouped Attention Network for Referring Expression Segmentation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Social Media Based Topic Modeling for Smart Campus: A Deep Topical Correlation Analysis Method.
IEEE Access, 2019

Towards Cross-modality Topic Modelling via Deep Topical Correlation Analysis.
Proceedings of the IEEE International Conference on Acoustics, 2019

Dynamic Capsule Attention for Visual Question Answering.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2017
Bayesian Estimation of a Dynamic Model of Two-Sided Markets: Application to the U.S. Video Game Industry.
Manag. Sci., 2017

More Than An Answer: Neural Pivot Network for Visual Qestion Answering.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

2016
Survey of visual sentiment prediction for social media analysis.
Frontiers Comput. Sci., 2016

2015
Design of Personalized News Comments Recommendation System.
Proceedings of the Data Science - Second International Conference, 2015


  Loading...