Han Xiao

Orcid: 0000-0002-8884-5344

Affiliations:
  • Chinese University of Hong Kong
  • Tsinghua University, Beijing, China (former)


According to our database1, Han Xiao authored at least 33 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents.
CoRR, May, 2026

SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning.
CoRR, April, 2026

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents.
CoRR, April, 2026

PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents.
CoRR, March, 2026

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments.
CoRR, February, 2026

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents.
CoRR, February, 2026

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction.
CoRR, June, 2025

WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch.
CoRR, May, 2025

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects.
CoRR, April, 2025

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects.
Trans. Mach. Learn. Res., 2025

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Learning Generalizable Mixed-Precision Quantization via Attribution Imitation.
Int. J. Comput. Vis., November, 2024

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.
CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.
CoRR, 2024

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models.
CoRR, 2024

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression.
Int. J. Comput. Vis., July, 2023

Learning Deep Binary Descriptors via Bitwise Interaction Mining.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.
CoRR, 2023

Token-Label Alignment for Vision Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Generalizable Mixed-Precision Quantization via Attribution Rank Preservation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021


  Loading...