Han Xiao
Orcid: 0000-0002-8884-5344Affiliations:
- Chinese University of Hong Kong
- Tsinghua University, Beijing, China (former)
According to our database1,
Han Xiao authored at least 33 papers
between 2021 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents.
CoRR, May, 2026
CoRR, April, 2026
CoRR, April, 2026
PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents.
CoRR, March, 2026
CoRR, February, 2026
UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents.
CoRR, February, 2026
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
2025
CoRR, June, 2025
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch.
CoRR, May, 2025
CoRR, April, 2025
Trans. Mach. Learn. Res., 2025
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the Findings of the Association for Computational Linguistics, 2025
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
2024
Int. J. Comput. Vis., November, 2024
CoRR, 2024
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024
SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024
No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2023
Int. J. Comput. Vis., July, 2023
IEEE Trans. Pattern Anal. Mach. Intell., 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021