Rui Shao

Orcid: 0000-0003-0090-9604

Affiliations:
  • Harbin Institute of Technology (Shenzhen), School of Computer Science and Technology, China
  • Nanyang Technological University, Singapore (2021 - 2023)
  • Hong Kong Baptist University, Department of Computer Science, Hong Kong (PhD 2021)


According to our database1, Rui Shao authored at least 44 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey.
CoRR, August, 2025

UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries.
CoRR, July, 2025

PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning.
CoRR, July, 2025

Less is More: Empowering GUI Agent with Context-Aware Simplification.
CoRR, July, 2025

DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection.
Int. J. Comput. Vis., June, 2025

Robust Sequential DeepFake Detection.
Int. J. Comput. Vis., June, 2025

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills.
CoRR, June, 2025

Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts.
CoRR, June, 2025

STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization.
CoRR, June, 2025

DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer.
CoRR, April, 2025

TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs.
CoRR, March, 2025

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers.
CoRR, January, 2025

Spa-Bench: a comprehensive Benchmark for Smartphone Agent Evaluation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Detecting and Grounding Multi-Modal Media Manipulation and Beyond.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2024

Federated Generalized Face Presentation Attack Detection.
IEEE Trans. Neural Networks Learn. Syst., January, 2024

Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding.
CoRR, 2024

RoboMP<sup>2</sup>: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models.
CoRR, 2024

Enhancing the Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought.
CoRR, 2024

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

RoboMP2: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Tackling Model Mismatch with Mixup Regulated Test-Time Training.
Proceedings of the 10th IEEE International Conference on Data Science and Advanced Analytics, 2023

Detecting and Grounding Multi-Modal Media Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Open-Set Adversarial Defense with Clean-Adversarial Mutual Learning.
Int. J. Comput. Vis., 2022

Mixup for Test-Time Training.
CoRR, 2022

Detecting and Recovering Sequential DeepFake Manipulation.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Focusing on Clinically Interpretable Features: Selective Attention Regularization for Liver Biopsy Image Classification.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Federated Test-Time Adaptive Face Presentation Attack Detection with Dual-Phase Privacy Preservation.
Proceedings of the 16th IEEE International Conference on Automatic Face and Gesture Recognition, 2021

2020
Federated Face Anti-spoofing.
CoRR, 2020

Open-Set Adversarial Defense.
Proceedings of the Computer Vision - ECCV 2020, 2020

Regularized Fine-Grained Meta Face Anti-Spoofing.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Joint Discriminative Learning of Deep Dynamic Textures for 3D Mask Face Anti-Spoofing.
IEEE Trans. Inf. Forensics Secur., 2019

Learning Modality-Consistency Feature Templates: A Robust RGB-Infrared Tracking System.
IEEE Trans. Ind. Electron., 2019

Adversarial auto-encoder for unsupervised deep domain adaptation.
IET Image Process., 2019

Online Non-Negative Multi-Modality Feature Template Learning for RGB-Assisted Infrared Tracking.
IEEE Access, 2019

Multi-Adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Feature Constrained by Pixel: Hierarchical Adversarial Deep Domain Adaptation.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017
Deep convolutional dynamic texture learning with adaptive channel-discriminability for 3D mask face anti-spoofing.
Proceedings of the 2017 IEEE International Joint Conference on Biometrics, 2017



  Loading...