Han Xiao

Orcid: 0000-0002-8884-5344

Affiliations:
  • Chinese University of Hong Kong
  • Tsinghua University, Beijing, China (former)


According to our database1, Han Xiao authored at least 25 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction.
CoRR, June, 2025

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents.
CoRR, May, 2025

WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch.
CoRR, May, 2025

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects.
CoRR, April, 2025

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Learning Generalizable Mixed-Precision Quantization via Attribution Imitation.
Int. J. Comput. Vis., November, 2024

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection.
CoRR, 2024

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.
CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.
CoRR, 2024

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models.
CoRR, 2024

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression.
Int. J. Comput. Vis., July, 2023

Learning Deep Binary Descriptors via Bitwise Interaction Mining.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.
CoRR, 2023

Token-Label Alignment for Vision Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Generalizable Mixed-Precision Quantization via Attribution Rank Preservation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021


  Loading...