Rui Shao

Orcid: 0000-0003-0090-9604

Affiliations:
  • Harbin Institute of Technology (Shenzhen), School of Computer Science and Technology, China
  • Nanyang Technological University, Singapore (2021 - 2023)
  • Hong Kong Baptist University, Department of Computer Science, Hong Kong (PhD 2021)


According to our database1, Rui Shao authored at least 62 papers between 2017 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
From Abstraction to Instantiation: Learning Behavioral Representation for Vision-Language-Action Model.
CoRR, May, 2026

ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation.
CoRR, May, 2026

HATS: Hardness-Aware Trajectory Synthesis for GUI Agents.
CoRR, March, 2026

ΔVLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation.
CoRR, March, 2026

Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems.
IEEE Wirel. Commun., February, 2026

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation.
CoRR, February, 2026

Learning to Accelerate Vision-Language-Action Models through Adaptive Visual Token Caching.
CoRR, February, 2026

Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning.
CoRR, February, 2026

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records.
CoRR, January, 2026

UniEmo: Unifying Emotional Understanding and Generation With Learnable Expert Queries.
IEEE Trans. Image Process., 2026

H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
HiconAgent: History Context-aware Policy Optimization for GUI Agents.
CoRR, December, 2025

CAT+: Investigating and Enhancing Audio-Visual Understanding in Large Language Models.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2025

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification.
CoRR, August, 2025

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey.
CoRR, August, 2025

DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection.
Int. J. Comput. Vis., June, 2025

Robust Sequential DeepFake Detection.
Int. J. Comput. Vis., June, 2025

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills.
CoRR, June, 2025

Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts.
CoRR, June, 2025

DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer.
CoRR, April, 2025

TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs.
CoRR, March, 2025

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers.
CoRR, January, 2025

CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

EmoSym: A Symbiotic Framework for Unified Emotional Understanding and Generation via Latent Reasoning.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Spa-Bench: a comprehensive Benchmark for Smartphone Agent Evaluation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

FALCON: Resolving Visual Redundancy and Fragmentation in High-Resolution Multimodal Large Language Models via Visual Registers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Less is More: Empowering GUI Agent with Context-Aware Simplification.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Detecting and Grounding Multi-Modal Media Manipulation and Beyond.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2024

Federated Generalized Face Presentation Attack Detection.
IEEE Trans. Neural Networks Learn. Syst., January, 2024

Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding.
CoRR, 2024

RoboMP<sup>2</sup>: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models.
CoRR, 2024

Enhancing the Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought.
CoRR, 2024

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

RoboMP2: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios.
Proceedings of the Computer Vision - ECCV 2024, 2024

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Tackling Model Mismatch with Mixup Regulated Test-Time Training.
Proceedings of the 10th IEEE International Conference on Data Science and Advanced Analytics, 2023

Detecting and Grounding Multi-Modal Media Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Open-Set Adversarial Defense with Clean-Adversarial Mutual Learning.
Int. J. Comput. Vis., 2022

Mixup for Test-Time Training.
CoRR, 2022

Detecting and Recovering Sequential DeepFake Manipulation.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Focusing on Clinically Interpretable Features: Selective Attention Regularization for Liver Biopsy Image Classification.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Federated Test-Time Adaptive Face Presentation Attack Detection with Dual-Phase Privacy Preservation.
Proceedings of the 16th IEEE International Conference on Automatic Face and Gesture Recognition, 2021

2020
Federated Face Anti-spoofing.
CoRR, 2020

Open-Set Adversarial Defense.
Proceedings of the Computer Vision - ECCV 2020, 2020

Regularized Fine-Grained Meta Face Anti-Spoofing.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Joint Discriminative Learning of Deep Dynamic Textures for 3D Mask Face Anti-Spoofing.
IEEE Trans. Inf. Forensics Secur., 2019

Learning Modality-Consistency Feature Templates: A Robust RGB-Infrared Tracking System.
IEEE Trans. Ind. Electron., 2019

Adversarial auto-encoder for unsupervised deep domain adaptation.
IET Image Process., 2019

Online Non-Negative Multi-Modality Feature Template Learning for RGB-Assisted Infrared Tracking.
IEEE Access, 2019

Multi-Adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Feature Constrained by Pixel: Hierarchical Adversarial Deep Domain Adaptation.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017
Deep convolutional dynamic texture learning with adaptive channel-discriminability for 3D mask face anti-spoofing.
Proceedings of the 2017 IEEE International Joint Conference on Biometrics, 2017



  Loading...