Bin Zhu

Orcid: 0000-0002-9213-2611

Affiliations:
  • Singapore Management University, School of Computing and Information Systems, Singapore
  • University of Bristol, UK (former)
  • City University of Hong Kong, Department of Computer Science, Kowloon Tong, Hong Kong (PhD 2021)


According to our database1, Bin Zhu authored at least 50 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Spatiotemporal Sycophancy: Negation-Based Gaslighting in Video Large Language Models.
CoRR, April, 2026

SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning.
CoRR, April, 2026

Enhancing Action and Ingredient Modeling for Semantically Grounded Recipe Generation.
CoRR, February, 2026

CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion.
ACM Trans. Multim. Comput. Commun. Appl., January, 2026

ThinkMatter: Panoramic-Aware Instructional Semantics for Monocular Vision-and-Language Navigation.
IEEE Trans. Image Process., 2026

Benchmarking Gaslighting Negation Attacks Against Multimodal Large Language Models.
Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation.
Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models.
Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

OSCBench: Benchmarking Object State Change in Text-to-Video Generation.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
CVLP-NaVD: Contrastive Visual-language Pre-training Models for Non-annotated Visual Description.
ACM Trans. Multim. Comput. Commun. Appl., November, 2025

Dual-LoRA and Quality-Enhanced Pseudo Replay for Multimodal Continual Food Learning.
CoRR, November, 2025

Efficient Test-Time Retrieval Augmented Generation.
CoRR, November, 2025

Reasoning Models Are More Easily Gaslighted Than You Think.
CoRR, June, 2025

Don't Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs.
CoRR, April, 2025

Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation.
CoRR, January, 2025

FoodLMM: A Versatile Food Assistant Using Large Multi-Modal Model.
IEEE Trans. Multim., 2025

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios.
IEEE Trans. Multim., 2025

Retrieval Augmented Recipe Generation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion.
Proceedings of the 2025 International Conference on Multimedia Retrieval, 2025

Efficient Prompt Tuning for Hierarchical Ingredient Recognition.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

HD-EPIC: A Highly-Detailed Egocentric Video Dataset.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OSCAR: Object Status and Contextual Awareness for Recipes to Support Non-Visual Cooking.
Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2025

Exploring Object Status Recognition for Recipe Progress Tracking in Non-Visual Cooking.
Proceedings of the 27th International ACM SIGACCESS Conference on Computers and Accessibility, 2025

RAGG: Retrieval-Augmented Grasp Generation Model.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Efficient Unsupervised Video Hashing With Contextual Modeling and Structural Controlling.
IEEE Trans. Multim., 2024

Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning.
CoRR, 2024

Model Inversion Attacks Through Target-Specific Conditional Diffusion Models.
CoRR, 2024

Active Object Segmentation: A New Modality for Egocentric Action Recognition.
Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024

Navigating Weight Prediction with Diet Diary.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Video Editing for Video Retrieval.
Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
CAR: Consolidation, Augmentation and Regulation for Recipe Retrieval.
CoRR, 2023

Cross-domain Food Image-to-Recipe Retrieval by Weighted Adversarial Learning.
CoRR, 2023

CgT-GAN: CLIP-guided Text GAN for Image Captioning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

2022
Learning From Web Recipe-Image Pairs for Food Recognition: Problem, Baselines and Performance.
IEEE Trans. Multim., 2022

Text-driven Video Prediction.
CoRR, 2022

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Mix-DANN and Dynamic-Modal-Distillation for Video Domain Adaptation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Cross-lingual Adaptation for Recipe Retrieval with Mixup.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

2021
Learning to Match Anchor-Target Video Pairs With Dual Attentional Holographic Networks.
IEEE Trans. Image Process., 2021

A Study of Multi-Task and Region-Wise Deep Learning for Food Ingredient Recognition.
IEEE Trans. Image Process., 2021

Pyramid Fusion Dark Channel Prior for Single Image Dehazing.
CoRR, 2021

2020
Cross-domain Cross-modal Food Transfer.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Person-level Action Recognition in Complex Events via TSD-TSM Networks.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

CookGAN: Causality Based Text-to-Image Synthesis.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
R2GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019


  Loading...