Hao Li

Affiliations:

Chinese University of Hong Kong, SAR, China
Tsinghua University, China (former)

According to our database¹, Hao Li authored at least 42 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy.

[BibT_eX]

[DOI]

CoRR, October, 2025

ICL-Router: In-Context Learned Model Representations for LLM Routing.

[BibT_eX]

[DOI]

CoRR, October, 2025

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints.

[BibT_eX]

[DOI]

CoRR, October, 2025

Sequential Diffusion Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration.

[BibT_eX]

[DOI]

CoRR, August, 2025

Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing.

[BibT_eX]

[DOI]

CoRR, August, 2025

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45<sup>°</sup> Law.

[BibT_eX]

[DOI]

CoRR, July, 2025

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation.

[BibT_eX]

[DOI]

CoRR, July, 2025

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation.

[BibT_eX]

[DOI]

CoRR, June, 2025

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

GSCodec Studio: A Modular Framework for Gaussian Splat Compression.

[BibT_eX]

[DOI]

CoRR, June, 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost.

[BibT_eX]

[DOI]

CoRR, May, 2025

The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants.

[BibT_eX]

[DOI]

CoRR, May, 2025

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space.

[BibT_eX]

[DOI]

CoRR, May, 2025

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT.

[BibT_eX]

[DOI]

CoRR, May, 2025

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing.

[BibT_eX]

[DOI]

CoRR, April, 2025

LangBridge: Interpreting Image as a Combination of Language Embeddings.

[BibT_eX]

[DOI]

CoRR, March, 2025

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing.

[BibT_eX]

[DOI]

CoRR, March, 2025

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GIFStream: 4D Gaussian-based Immersive Video with Feature Stream.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

IOB: integrating optimization transfer and behavior transfer for multi-policy reuse.

[BibT_eX]

[DOI]

Auton. Agents Multi Agent Syst., June, 2024

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues.

[BibT_eX]

[DOI]

CoRR, 2024

PET-NeRV: Bridging Generalized Video Codec and Content-Specific Neural Representation.

[BibT_eX]

[DOI]

Hao Li

Lu Yu

Yiyi Liao

Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2024

Parameter-Inverted Image Pyramid Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.

[BibT_eX]

[DOI]

CoRR, 2021

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

2019

Improved Techniques for Training Adaptive Deep Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Hao Li

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...