Hengshuang Zhao

Orcid: 0000-0001-8277-2706

According to our database¹, Hengshuang Zhao authored at least 192 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Causal Prompts for Open-Vocabulary Video Instance Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2026

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward.

[BibT_eX]

[DOI]

CoRR, May, 2026

DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies.

[BibT_eX]

[DOI]

CoRR, May, 2026

Continuous Latent Diffusion Language Model.

[BibT_eX]

[DOI]

CoRR, May, 2026

Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2026

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, April, 2026

Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting.

[BibT_eX]

[DOI]

CoRR, March, 2026

SURF: Signature-Retained Fast Video Generation.

[BibT_eX]

[DOI]

CoRR, March, 2026

FASTER: Rethinking Real-Time Flow VLAs.

[BibT_eX]

[DOI]

CoRR, March, 2026

Utonia: Toward One Encoder for All Point Clouds.

[BibT_eX]

[DOI]

CoRR, March, 2026

ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments.

[BibT_eX]

[DOI]

CoRR, March, 2026

WorldCompass: Reinforcement Learning for Long-Horizon World Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

Any3D-VLA: Enhancing VLA Robustness via Diverse Point Clouds.

[BibT_eX]

[DOI]

CoRR, February, 2026

Liquid: Language Models are Scalable and Unified Multi-Modal Generators.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., January, 2026

CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation.

[BibT_eX]

[DOI]

CoRR, January, 2026

Orient Anything V2: Unifying Orientation and Rotation Understanding.

[BibT_eX]

[DOI]

CoRR, January, 2026

GDRO: Group-level Reward Post-training Suitable for Diffusion Models.

[BibT_eX]

[DOI]

CoRR, January, 2026

Game Ground Bench: Probing the Limits of LVLMs in Complex Semantic Grounding Across Game Universes.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection.

[BibT_eX]

[DOI]

CoRR, December, 2025

In Pursuit of Pixel Supervision for Visual Pre-training.

[BibT_eX]

[DOI]

CoRR, December, 2025

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives.

[BibT_eX]

[DOI]

CoRR, December, 2025

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning.

[BibT_eX]

[DOI]

CoRR, December, 2025

GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation.

[BibT_eX]

[DOI]

CoRR, December, 2025

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance.

[BibT_eX]

[DOI]

CoRR, December, 2025

Visual Spatial Tuning.

[BibT_eX]

[DOI]

CoRR, November, 2025

UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs.

[BibT_eX]

[DOI]

CoRR, November, 2025

Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations.

[BibT_eX]

[DOI]

CoRR, October, 2025

PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, October, 2025

Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents.

[BibT_eX]

[DOI]

CoRR, October, 2025

From Noisy Traces to Stable Gradients: Bias-Variance Optimized Preference Optimization for Aligning Large Reasoning Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Toward Unified 3D Object Detection via Algorithm and Data Unification.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search.

[BibT_eX]

[DOI]

CoRR, September, 2025

PonderV2: Improved 3D Representation With a Universal Pre-Training Paradigm.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., August, 2025

AnyDoor: Zero-Shot Image Customization With Region-to-Region Reference.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., August, 2025

Train Once, Deploy Anywhere: Realize Data-Efficient Dynamic Object Manipulation.

[BibT_eX]

[DOI]

CoRR, August, 2025

Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception.

[BibT_eX]

[DOI]

CoRR, August, 2025

Animate-X++: Universal Character Image Animation with Dynamic Backgrounds.

[BibT_eX]

[DOI]

CoRR, August, 2025

Language-Aware Vision Transformer for Referring Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

UniDetector: Towards Universal Object Detection With Heterogeneous Supervision.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, July, 2025

Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching.

[BibT_eX]

[DOI]

CoRR, July, 2025

DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation.

[BibT_eX]

[DOI]

CoRR, July, 2025

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, June, 2025

FocalClick-XL: Towards Unified and High-quality Interactive Segmentation.

[BibT_eX]

[DOI]

Xi Chen

Hengshuang Zhao

CoRR, June, 2025

GenSpace: Benchmarking Spatially-Aware Image Generation.

[BibT_eX]

[DOI]

CoRR, May, 2025

Depth Anything with Any Prior.

[BibT_eX]

[DOI]

CoRR, May, 2025

Guest Editorial Introduction to the Special Issue on Segment Anything for Videos and Beyond.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., April, 2025

UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

Lihe Yang

Zhen Zhao

Hengshuang Zhao

IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement.

[BibT_eX]

[DOI]

CoRR, April, 2025

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding.

[BibT_eX]

[DOI]

CoRR, March, 2025

Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Effective LLM Knowledge Learning via Model Generalization.

[BibT_eX]

[DOI]

CoRR, March, 2025

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

DiffCamera: Arbitrary Refocusing on Images.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2025 Conference Papers, 2025

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control.

[BibT_eX]

[DOI]

Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data.

[BibT_eX]

[DOI]

Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

FashionComposer: Compositional Fashion Image Generation.

[BibT_eX]

[DOI]

Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

LayerFlow: A Unified Model for Layer-aware Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

Seg-VAR: Image Segmentation with Visual Autoregressive Modeling.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

PlayerOne: Egocentric World Simulator.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ROSE: Remove Objects with Side Effects in Videos.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

BOOD: Boundary-based Out-Of-Distribution Data Generation.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

VIP: Vision Instructed Pre-training for Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

ViLLa: Video Reasoning Segmentation with Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DiffDoctor: Diagnosing Image Diffusion Models Before Treating.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Enhancing LLM Knowledge Learning through Generalization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Empowering Large Language Models with 3D Situation Awareness.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Sonata: Self-Supervised Learning of Reliable Point Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

SDPT: Semantic-Aware Dimension-Pooling Transformer for Image Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Transp. Syst., November, 2024

GroupLane: End-to-End 3D Lane Detection With Channel-Wise Grouping.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., November, 2024

DriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language Model.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., October, 2024

Liquid: Language Models are Scalable Multi-modal Generators.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning.

[BibT_eX]

[DOI]

CoRR, 2024

VIRT: Vision Instructed Transformer for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions.

[BibT_eX]

[DOI]

CoRR, 2024

Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

ViLLa: Video Reasoning Segmentation with Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces.

[BibT_eX]

[DOI]

CoRR, 2024

Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images.

[BibT_eX]

[DOI]

CoRR, 2024

OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

SyncVIS: Synchronized Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Depth Anything V2.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

LiT: Unifying LiDAR "Languages" with LiDAR Translator.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Zero-shot Image Editing with Reference Imitation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

LION: Linear Group RNN for 3D Object Detection in Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Influencer Backdoor Attack on Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LogoSticker: Inserting Logos Into Diffusion Models for Customized Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Pixel-GS: Density Control with Pixel-Aware Gradient for 3D Gaussian Splatting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

InsMapper: Exploring Inner-Instance Information for Vectorized HD Mapping.

[BibT_eX]

[DOI]

Zhenhua Xu

Kwan-Yee K. Wong

Hengshuang Zhao

Proceedings of the Computer Vision - ECCV 2024, 2024

OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

OpenIns3D: Snap and Lookup for 3D Open-Vocabulary Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

LivePhoto: Real Image Animation with Text-Guided Motion Control.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Visual Programming for Zero-Shot Open-Vocabulary 3D Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

UniPAD: A Universal Pre-Training Paradigm for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Large-Scale 3D Representation Learning with Multi-Dataset Point Prompt Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GroupContrast: Semantic-Aware Self-Supervised Representation Learning for 3D Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GPT4Point: A Unified Framework for Point-Language Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

UniMODE: Unified Monocular 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AnyDoor: Zero-shot Object-level Image Customization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Point Transformer V3: Simpler, Faster, Stronger.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2024

2023

Patch-Based Separable Transformer for Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Open World Entity Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

PhysFormer++: Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., June, 2023

Fully Convolutional Networks for Panoptic Segmentation With Point-Based Supervision.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2023

Adaptive Perspective Distillation for Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2023

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases.

[BibT_eX]

[DOI]

CoRR, 2023

Self-supervised Learning for Enhancing Geometrical Modeling in 3D-Aware Generative Adversarial Network.

[BibT_eX]

[DOI]

Jiarong Guo

Xiaogang Xu

Hengshuang Zhao

CoRR, 2023

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

A Lightweight Clustering Framework for Unsupervised Semantic Segmentation.

[BibT_eX]

[DOI]

Yau Shing Jonathan Cheung

Xi Chen

Lihe Yang

Hengshuang Zhao

CoRR, 2023

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm.

[BibT_eX]

[DOI]

CoRR, 2023

InsightMapper: A Closer Look at Inner-instance Information for Vectorized High-Definition Mapping.

[BibT_eX]

[DOI]

Zhenhua Xu

Kenneth K. Y. Wong

Hengshuang Zhao

CoRR, 2023

SAM3D: Segment Anything in 3D Scenes.

[BibT_eX]

[DOI]

CoRR, 2023

VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2023

ScribbleSeg: Scribble-based Interactive Image Segmentation.

[BibT_eX]

[DOI]

Xi Chen

Yau Shing Jonathan Cheung

Ser-Nam Lim

Hengshuang Zhao

CoRR, 2023

GeoSpark: Sparking up Point Cloud Segmentation with Geometry Clue.

[BibT_eX]

[DOI]

Georgios M. Hadjidemetriou

Ioannis K. Brilakis

CoRR, 2023

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CorresNeRF: Image Correspondence Priors for Neural Radiance Fields.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Uni3DETR: Unified 3D Detection Transformer.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Universal Adaptive Data Augmentation.

[BibT_eX]

[DOI]

Xiaogang Xu

Hengshuang Zhao

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

BT<sup>2</sup>: Backward-compatible Training with Basis Transformation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Open-vocabulary Panoptic Segmentation with Embedding Modulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners.

[BibT_eX]

[DOI]

Erik G. Learned-Miller

Chuang Gan

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Detecting Everything in the Open World: Towards Universal Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Prior Guided Feature Enrichment Network for Few-Shot Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners.

[BibT_eX]

[DOI]

Erik G. Learned-Miller

Chuang Gan

CoRR, 2022

General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments.

[BibT_eX]

[DOI]

CoRR, 2022

Universal Adaptive Data Augmentation.

[BibT_eX]

[DOI]

Xiaogang Xu

Hengshuang Zhao

Philip H. S. Torr

CoRR, 2022

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Prototype-Voxel Contrastive Learning for LiDAR Point Cloud Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2022 International Conference on Robotics and Automation, 2022

MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Generalized Few-shot Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Stratified Transformer for 3D Point Cloud Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

FocalClick: Towards Practical Interactive Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Adversarial Examples on Segmentation Models Can be Easy to Transfer.

[BibT_eX]

[DOI]

CoRR, 2021

Do Different Tracking Tasks Require Different Appearance Models?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dual-Cross Central Difference Network for Face Anti-Spoofing.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Point Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation.

[BibT_eX]

[DOI]

Xiaogang Xu

Hengshuang Zhao

Jiaya Jia

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

PAConv: Position Adaptive Convolution With Dynamic Kernel Assembling on Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Fully Convolutional Networks for Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Bidirectional Projection Network for Cross Dimension Scene Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Distilling Knowledge via Knowledge Review.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Hierarchical Interaction Network for Video Object Segmentation from Referring Expressions.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Generalized Few-Shot Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2020

GridMask Data Augmentation.

[BibT_eX]

[DOI]

CoRR, 2020

Exploring Self-Attention for Image Recognition.

[BibT_eX]

[DOI]

Hengshuang Zhao

Jiaya Jia

Vladlen Koltun

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Region Refinement Network for Salient Object Detection.

[BibT_eX]

[DOI]

CoRR, 2019

Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

UPSNet: A Unified Panoptic Segmentation Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

PSANet: Point-wise Spatial Attention Network for Scene Parsing.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Compositing-Aware Image Search.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

ICNet for Real-Time Semantic Segmentation on High-Resolution Images.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

SegStereo: Exploiting Semantic Information for Disparity Estimation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

2017

Automatic Real-time Background Cut for Portrait Videos.

[BibT_eX]

[DOI]

CoRR, 2017

Pyramid Scene Parsing Network.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Augmented Feedback in Semantic Segmentation Under Image Level Supervision.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Hengshuang Zhao

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...