Han Xiao

Orcid: 0000-0002-8884-5344

Affiliations:

Chinese University of Hong Kong
Tsinghua University, Beijing, China (former)

According to our database¹, Han Xiao authored at least 33 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents.

[BibT_eX]

[DOI]

CoRR, May, 2026

SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, April, 2026

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents.

[BibT_eX]

[DOI]

CoRR, April, 2026

PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents.

[BibT_eX]

[DOI]

CoRR, March, 2026

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments.

[BibT_eX]

[DOI]

CoRR, February, 2026

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents.

[BibT_eX]

[DOI]

CoRR, February, 2026

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction.

[BibT_eX]

[DOI]

CoRR, June, 2025

WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch.

[BibT_eX]

[DOI]

CoRR, May, 2025

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects.

[BibT_eX]

[DOI]

CoRR, April, 2025

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Learning Generalizable Mixed-Precision Quantization via Attribution Imitation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., November, 2024

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

CoRR, 2024

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., July, 2023

Learning Deep Binary Descriptors via Bitwise Interaction Mining.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

Token-Label Alignment for Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Generalizable Mixed-Precision Quantization via Attribution Rank Preservation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Han Xiao

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...