Hao Li

Orcid: 0009-0002-4473-6012

Affiliations:

Chinese University of Hong Kong, SAR, China
Tsinghua University, China (former)

According to our database¹, Hao Li authored at least 62 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

[BibT_eX]

[DOI]

CoRR, May, 2026

Rethinking VLM Representation for VLA Initialization.

[BibT_eX]

[DOI]

CoRR, May, 2026

SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation.

[BibT_eX]

[DOI]

CoRR, May, 2026

GSCodec Studio: A Modular Framework for Gaussian Splat Compression.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., April, 2026

MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings.

[BibT_eX]

[DOI]

CoRR, April, 2026

FutureVLA: Joint Visuomotor Prediction for Vision-Language-Action Model.

[BibT_eX]

[DOI]

CoRR, March, 2026

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence.

[BibT_eX]

[DOI]

CoRR, March, 2026

Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale.

[BibT_eX]

[DOI]

CoRR, March, 2026

Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction.

[BibT_eX]

[DOI]

CoRR, February, 2026

RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, February, 2026

LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing.

[BibT_eX]

[DOI]

CoRR, January, 2026

Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale.

[BibT_eX]

[DOI]

CoRR, January, 2026

The Avengers: A Routing Recipe for Collective Intelligence in Language Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

ICL-Router: In-Context Learned Model Representations for LLM Routing.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction.

[BibT_eX]

[DOI]

CoRR, October, 2025

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy.

[BibT_eX]

[DOI]

CoRR, October, 2025

ICL-Router: In-Context Learned Model Representations for LLM Routing.

[BibT_eX]

[DOI]

CoRR, October, 2025

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints.

[BibT_eX]

[DOI]

CoRR, October, 2025

Sequential Diffusion Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration.

[BibT_eX]

[DOI]

CoRR, August, 2025

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation.

[BibT_eX]

[DOI]

CoRR, July, 2025

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation.

[BibT_eX]

[DOI]

CoRR, June, 2025

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost.

[BibT_eX]

[DOI]

CoRR, May, 2025

The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants.

[BibT_eX]

[DOI]

CoRR, May, 2025

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space.

[BibT_eX]

[DOI]

CoRR, May, 2025

Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets.

[BibT_eX]

[DOI]

CoRR, May, 2025

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT.

[BibT_eX]

[DOI]

CoRR, May, 2025

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing.

[BibT_eX]

[DOI]

CoRR, April, 2025

OmniCam: Unified Multimodal Video Generation via Camera Control.

[BibT_eX]

[DOI]

CoRR, April, 2025

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing.

[BibT_eX]

[DOI]

CoRR, March, 2025

Astrea: A MOE-based Visual Understanding Model with Progressive Alignment.

[BibT_eX]

[DOI]

CoRR, March, 2025

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

LangBridge: Interpreting Image as a Combination of Language Embeddings.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

CityGS-$\mathcal{X}$: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

PUMA: Empowering Unified MLLM with Multi-Granular Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing.

[BibT_eX]

[DOI]

Proceedings of the 2025 7th International Conference on Distributed Artificial Intelligence, 2025

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GIFStream: 4D Gaussian-based Immersive Video with Feature Stream.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

IOB: integrating optimization transfer and behavior transfer for multi-policy reuse.

[BibT_eX]

[DOI]

Auton. Agents Multi Agent Syst., June, 2024

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues.

[BibT_eX]

[DOI]

CoRR, 2024

PET-NeRV: Bridging Generalized Video Codec and Content-Specific Neural Representation.

[BibT_eX]

[DOI]

Hao Li

Lu Yu

Yiyi Liao

Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2024

Parameter-Inverted Image Pyramid Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.

[BibT_eX]

[DOI]

CoRR, 2021

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

2019

Improved Techniques for Training Adaptive Deep Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Hao Li

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...