Hao Li

Orcid: 0009-0002-4473-6012

Affiliations:
  • Chinese University of Hong Kong, SAR, China
  • Tsinghua University, China (former)


According to our database1, Hao Li authored at least 62 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
SpatialBench: Is Your Spatial Foundation Model an All-Round Player?
CoRR, May, 2026

Rethinking VLM Representation for VLA Initialization.
CoRR, May, 2026

SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation.
CoRR, May, 2026

GSCodec Studio: A Modular Framework for Gaussian Splat Compression.
IEEE Trans. Circuits Syst. Video Technol., April, 2026

MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings.
CoRR, April, 2026

FutureVLA: Joint Visuomotor Prediction for Vision-Language-Action Model.
CoRR, March, 2026

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence.
CoRR, March, 2026

Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale.
CoRR, March, 2026

Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction.
CoRR, February, 2026

RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation.
CoRR, February, 2026

LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing.
CoRR, January, 2026

Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale.
CoRR, January, 2026

The Avengers: A Routing Recipe for Collective Intelligence in Language Models.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

ICL-Router: In-Context Learned Model Representations for LLM Routing.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction.
CoRR, October, 2025

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy.
CoRR, October, 2025

ICL-Router: In-Context Learned Model Representations for LLM Routing.
CoRR, October, 2025

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints.
CoRR, October, 2025

Sequential Diffusion Language Models.
CoRR, September, 2025

PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration.
CoRR, August, 2025

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation.
CoRR, July, 2025

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.
CoRR, July, 2025

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation.
CoRR, June, 2025

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation.
CoRR, June, 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost.
CoRR, May, 2025

The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants.
CoRR, May, 2025

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space.
CoRR, May, 2025

Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets.
CoRR, May, 2025

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT.
CoRR, May, 2025

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing.
CoRR, April, 2025

OmniCam: Unified Multimodal Video Generation via Camera Control.
CoRR, April, 2025

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing.
CoRR, March, 2025

Astrea: A MOE-based Visual Understanding Model with Progressive Alignment.
CoRR, March, 2025

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

LangBridge: Interpreting Image as a Combination of Language Embeddings.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

CityGS-$\mathcal{X}$: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

PUMA: Empowering Unified MLLM with Multi-Granular Visual Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing.
Proceedings of the 2025 7th International Conference on Distributed Artificial Intelligence, 2025

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GIFStream: 4D Gaussian-based Immersive Video with Feature Stream.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
IOB: integrating optimization transfer and behavior transfer for multi-policy reuse.
Auton. Agents Multi Agent Syst., June, 2024

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation.
CoRR, 2024

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues.
CoRR, 2024

PET-NeRV: Bridging Generalized Video Codec and Content-Specific Neural Representation.
Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2024

Parameter-Inverted Image Pyramid Networks.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
JourneyDB: A Benchmark for Generative Image Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.
CoRR, 2021

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation.
Proceedings of the 9th International Conference on Learning Representations, 2021

2019
Improved Techniques for Training Adaptive Deep Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019


  Loading...