Zhou Yu

Orcid: 0000-0001-8407-1137

Affiliations:
  • Hangzhou Dianzi University, Key Laboratory of Complex Systems Modeling and Simulation, Hangzhou, China


According to our database1, Zhou Yu authored at least 80 papers between 2008 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
TailorEdit: An Adaptive Framework for Instruction-Guided Fashion Image Editing.
IEEE Trans. Circuits Syst. Video Technol., June, 2026

Fuzzy Language Gaussian Splatting.
IEEE Trans. Fuzzy Syst., April, 2026

Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding.
CoRR, April, 2026

CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding.
CoRR, April, 2026

GenSmoke-GS: A Multi-Stage Method for Novel View Synthesis from Smoke-Degraded Images Using a Generative Model.
CoRR, April, 2026

HERO: Hierarchical Embedding-Refinement for Open-Vocabulary Temporal Sentence Grounding in Videos.
CoRR, March, 2026

FedAFD: Multimodal Federated Learning via Adversarial Fusion and Distillation.
CoRR, March, 2026

HEART: Emotionally Grounded Video Captioning via Hierarchical Emotion-Aligned Representation.
IEEE Trans. Affect. Comput., 2026

Emotional conflict adaptation for multimodal sentiment analysis.
Pattern Recognit., 2026

GC-GS: Gradient control Gaussian splatting with various image degradation.
Pattern Recognit., 2026

KF-GS: Kalman filter-guided Gaussian splatting for real-time high-quality dynamic scene reconstruction.
J. Vis. Commun. Image Represent., 2026

Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

SRSplat: Feed-Forward Super-Resolution Gaussian Splatting from Sparse Multi-View Images.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation.
CoRR, October, 2025

REALM: An MLLM-Agent Framework for Open World 3D Reasoning Segmentation and Editing on Gaussian Splatting.
CoRR, October, 2025

Prophet: Prompting Large Language Models With Complementary Answer Heuristics for Knowledge-Based Visual Question Answering.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2025

Spatio-Temporal and Retrieval-Augmented Modeling for Chest X-Ray Report Generation.
IEEE Trans. Medical Imaging, July, 2025

Action-Driven Semantic Representation and Aggregation for Video Captioning.
IEEE Trans. Circuits Syst. Video Technol., April, 2025

Imp: Highly Capable Large Multimodal Models for Mobile Devices.
IEEE Trans. Multim., 2025

ScatDiff: Physical Diffusion Model for Electromagnetic Computational Imaging.
IEEE Trans. Geosci. Remote. Sens., 2025

Benchmarking and Enhancing Geospatial Visual Reasoning Over Street Maps.
IEEE Trans. Geosci. Remote. Sens., 2025

Modality-aware contrast and fusion for multi-modal summarization.
Neurocomputing, 2025

DiSCo: Disentangled Attribute Manipulation Retrieval via Semantic Reconstruction and Consistency Regularization.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

MMCNav: MLLM-empowered Multi-agent Collaboration for Outdoor Visual Language Navigation.
Proceedings of the 2025 International Conference on Multimedia Retrieval, 2025

Growing a Twig to Accelerate Large Vision-Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2024
Effective Video Summarization by Extracting Parameter-Free Motion Attention.
ACM Trans. Multim. Comput. Commun. Appl., July, 2024

Confidence correction for trained graph convolutional networks.
Pattern Recognit., 2024

Imp: Highly Capable Large Multimodal Models for Mobile Devices.
CoRR, 2024

3D Question Answering with Scene Graph Reasoning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

MPOD123: One Image to 3D Content Generation Using Mask-Enhanced Progressive Outline-to-Detail Optimization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
MARN: Multi-level Attentional Reconstruction Networks for Weakly Supervised Video Temporal Grounding.
Neurocomputing, October, 2023

Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering.
IEEE Trans. Multim., 2023

Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks.
CoRR, 2023

Contrastive Perturbation Network for Weakly Supervised Temporal Sentence Grounding.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Prompting Large Language Models with Answer Heuristics for Knowledge-Based Visual Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Deep relational self-Attention networks for scene graph generation.
Pattern Recognit. Lett., 2022

Question-relationship guided graph attention network for visual question answer.
Multim. Syst., 2022

Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer.
CoRR, 2022

Delegate-based Utility Preserving Synthesis for Pedestrian Image Anonymization.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

2021
SPRNet: Single-Pixel Reconstruction for One-Stage Instance Segmentation.
IEEE Trans. Cybern., 2021

Long-Term Video Question Answering via Multimodal Hierarchical Memory Attentive Networks.
IEEE Trans. Circuits Syst. Video Technol., 2021

Accelerated masked transformer for dense video captioning.
Neurocomputing, 2021

Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems.
CoRR, 2021

Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020
Compositional Attention Networks With Two-Stream Fusion for Video Question Answering.
IEEE Trans. Image Process., 2020

Multimodal Transformer With Multi-View Visual Representation for Image Captioning.
IEEE Trans. Circuits Syst. Video Technol., 2020

Intra- and Inter-modal Multilinear Pooling with Multitask Learning for Video Grounding.
Neural Process. Lett., 2020

Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos.
CoRR, 2020

Deep Multimodal Neural Architecture Search.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

2019
End-to-end visual grounding via region proposal networks and bilinear pooling.
IET Comput. Vis., 2019

Multimodal Unified Attention Networks for Vision-and-Language Interactions.
CoRR, 2019

Single Pixel Reconstruction for One-stage Instance Segmentation.
CoRR, 2019

Deep Modular Co-Attention Networks for Visual Question Answering.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
User-Click-Data-Based Fine-Grained Image Recognition via Weakly Supervised Metric Learning.
ACM Trans. Multim. Comput. Commun. Appl., 2018

Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering.
IEEE Trans. Neural Networks Learn. Syst., 2018

Comprehensive Distance-Preserving Autoencoders for Cross-Modal Retrieval.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Ontology-Driven Hierarchical Deep Learning for Fashion Recognition.
Proceedings of the IEEE 1st Conference on Multimedia Information Processing and Retrieval, 2018

Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

2017
Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering.
CoRR, 2017

Privacy Setting Recommendation for Image Sharing.
Proceedings of the 16th IEEE International Conference on Machine Learning and Applications, 2017

Deep Mixture of Experts with Diverse Task Spaces.
Proceedings of the 16th IEEE International Conference on Machine Learning and Applications, 2017

Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2015
RAISE: A Whole Process Modeling Method for Unstructured Data Management.
Proceedings of the 2015 IEEE International Conference on Multimedia Big Data, BigMM 2015, 2015

2014
Sparse Multi-Modal Hashing.
IEEE Trans. Multim., 2014

Hashing with List-Wise learning to rank.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Discriminative coupled dictionary hashing for fast cross-media retrieval.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Cross-Media Hashing with Neural Networks.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Cross-media hashing with kernel regression.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

2012
LuSH: A Generic High-Dimensional Index Framework.
Proceedings of the Web-Age Information Management, 2012

Image Ranking via Attribute Boosted Hypergraph.
Proceedings of the Advances in Multimedia Information Processing - PCM 2012, 2012

2010
Fire Surveillance Method Based on Quaternionic Wavelet Features.
Proceedings of the Advances in Multimedia Modeling, 2010

Error-correcting output hashing in fast similarity search.
Proceedings of the Second International Conference on Internet Multimedia Computing and Service, 2010

2009
Shanghai Jiao Tong University participation in high-level feature extraction and surveillance event detection at TRECVID 2009.
Proceedings of the TRECVID 2009 workshop participants notebook papers, 2009

Structure-Preserving Colorization Based on Quaternionic Phase Reconstruction.
Proceedings of the Advances in Multimedia Information Processing, 2009

2008
Shanghai Jiao Tong University participation in high-level feature extraction, automatic search and surveillance event detectionat TRECVID 2008.
Proceedings of the TRECVID 2008 workshop participants notebook papers, 2008


  Loading...