Hang Zhao

Affiliations:
  • Tsinghua University, MARS Lab, Beijing, China
  • Waymo LLC, Mountain View, CA, USA (former)
  • Massachusetts Institute of Technology (MIT), Computer Science & Artificial Intelligence Laboratory (CSAIL), Cambridge, MA, USA (PhD 2019)


According to our database1, Hang Zhao authored at least 146 papers between 2013 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Beyond Pixels: Efficient Dataset Distillation via Sparse Gaussian Representation.
CoRR, September, 2025

Galaxea Open-World Dataset and G0 Dual-System VLA Model.
CoRR, September, 2025

VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion.
IEEE Robotics Autom. Lett., August, 2025

DriveAgent-R1: Advancing VLM-based Autonomous Driving with Hybrid Thinking and Active Perception.
CoRR, July, 2025

GS-Occ3D: Scaling Vision-only Occupancy Reconstruction for Autonomous Driving with Gaussian Splatting.
CoRR, July, 2025

Delving into Mapping Uncertainty for Mapless Trajectory Prediction.
CoRR, July, 2025

CRUISE: Cooperative Reconstruction and Editing in V2X Scenarios using Gaussian Splatting.
CoRR, July, 2025

LONG3R: Long Sequence Streaming 3D Reconstruction.
CoRR, July, 2025

Reusing Attention for One-stage Lane Topology Understanding.
CoRR, July, 2025

Morpheus: A Neural-driven Animatronic Face with Hybrid Actuation and Diverse Emotion Control.
CoRR, July, 2025

Re:Form - Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny.
CoRR, July, 2025

ORV: 4D Occupancy-centric Robot Video Generation.
CoRR, June, 2025

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models.
CoRR, May, 2025

Challenger: Affordable Adversarial Driving Video Generation.
CoRR, May, 2025

Conditioning Matters: Training Diffusion Policies is Faster Than You Think.
CoRR, May, 2025

RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation.
CoRR, March, 2025

Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback.
CoRR, March, 2025

MoE-Loco: Mixture of Experts for Multitask Locomotion.
CoRR, March, 2025

Explaining Context Length Scaling and Bounds for Language Models.
CoRR, February, 2025

Embrace Collisions: Humanoid Shadowing for Deployable Contact-Agnostics Motions.
CoRR, February, 2025

Position: Prospective of Autonomous Driving - Multimodal LLMs, World Models, Embodied Intelligence, AI Alignment, and Mamba.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

SARO: Space-Aware Robot System for Terrain Crossing via Vision-Language Model.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Robust Robot Walker: Learning Agile Locomotion over Tiny Traps.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Chameleon: Fast-Slow Neuro-Symbolic Lane Topology Extraction.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

TrackOcc: Camera-Based 4D Panoptic Occupancy Tracking.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

2024
P-MapNet: Far-Seeing Map Generator Enhanced by Both SDMap and HDMap Priors.
IEEE Robotics Autom. Lett., October, 2024

Playful DoggyBot: Learning Agile and Precise Quadrupedal Locomotion.
CoRR, 2024

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation.
CoRR, 2024

Cross Anything: General Quadruped Robot Navigation through Complex Terrains.
CoRR, 2024

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory.
CoRR, 2024

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models.
CoRR, 2024

StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

LiDAR-based 4D Occupancy Completion and Forecasting.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

Boosting Offline Reinforcement Learning for Autonomous Driving with Hierarchical Latent Skills.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors.
Proceedings of the Computer Vision - ECCV 2024, 2024

CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction.
Proceedings of the Computer Vision - ECCV 2024, 2024

Humanoid Parkour Learning.
Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models.
Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments.
Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

2023
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module.
CoRR, 2023

Large Trajectory Models are Scalable Motion Predictors and Planners.
CoRR, 2023

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference.
CoRR, 2023

GPT-Driver: Learning to Drive with GPT.
CoRR, 2023

AutoEncoding Tree for City Generation and Applications.
CoRR, 2023

BEVScope: Enhancing Self-Supervised Depth Estimation Leveraging Bird's-Eye-View in Dynamic Scenarios.
CoRR, 2023

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving.
CoRR, 2023

ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory.
CoRR, 2023

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving.
CoRR, 2023

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving.
IROS, 2023

VectorMapNet: End-to-end Vectorized HD Map Learning.
Proceedings of the International Conference on Machine Learning, 2023

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning.
Proceedings of the International Conference on Machine Learning, 2023

The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Programmatically Grounded, Compositionally Generalizable Robotic Manipulation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Self-supervision through Random Segments with Autoregressive Coding (RandSAC).
Proceedings of the Eleventh International Conference on Learning Representations, 2023

INT2: Interactive Trajectory Prediction at Intersections.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Neural Map Prior for Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

What Happened 3 Seconds Ago? Inferring the Past with Thermal Imaging.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FUTR3D: A Unified Sensor Fusion Framework for 3D Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Robot Parkour Learning.
Proceedings of the Conference on Robot Learning, 2023

Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable.
Proceedings of the Conference on Robot Learning, 2023

A Universal Semantic-Geometric Representation for Robotic Manipulation.
Proceedings of the Conference on Robot Learning, 2023

Long-Term Interactive Driving Simulation: MPC to the Rescue.
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

M<sup>2</sup>Sim: A Long-Term Interactive Driving Simulator.
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

2022
AsyInst: Asymmetric Affinity with DepthGrad and Color for Box-Supervised Instance Segmentation.
CoRR, 2022

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction.
CoRR, 2022

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech.
CoRR, 2022

Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision.
CoRR, 2022

VectorMapNet: End-to-end Vectorized HD Map Learning.
CoRR, 2022

The Modality Focusing Hypothesis: On the Blink of Multimodal Knowledge Distillation.
CoRR, 2022

Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization.
CoRR, 2022

Self-supervision through Random Segments with Autoregressive Coding (RandSAC).
CoRR, 2022

InterSim: Interactive Traffic Simulation via Explicit Relation Modeling.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Intrinsically Motivated Self-supervised Learning in Reinforcement Learning.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

SEMI: Self-supervised Exploration via Multisensory Incongruity.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

HDMapNet: An Online HD Map Construction and Evaluation Framework.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Depth Estimation Matters Most: Improving Per-Object Depth Estimation for Monocular 3D Detection and Tracking.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes.
Proceedings of the Tenth International Conference on Learning Representations, 2022

R4D: Utilizing Reference Objects for Long-Range Distance Estimation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learning Visual Styles from Audio-Visual Associations.
Proceedings of the Computer Vision - ECCV 2022, 2022

MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Co-advise: Cross Inductive Bias Distillation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Embracing Single Stride 3D Object Detector with Sparse Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Egocentric Prediction of Action Target in 3D.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-training for Spatial-Aware Visual Representations.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Neural Dubber: Dubbing for Silent Videos According to Scripts.
CoRR, 2021

DenseTNT: Waymo Open Dataset Motion Prediction Challenge 1st Place Solution.
CoRR, 2021

Improving Multi-Modal Learning with Uni-Modal Teachers.
CoRR, 2021

What Makes Multimodal Learning Better than Single (Provably).
CoRR, 2021

Predictive Visual Tracking: A New Benchmark and Baseline Approach.
CoRR, 2021

What Makes Multi-Modal Learning Better than Single (Provably).
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Neural Dubber: Dubbing for Videos According to Scripts.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

CVC: Contrastive Learning for Non-Parallel Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multimodal Knowledge Expansion.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

On Feature Decorrelation in Self-Supervised Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Adversarially Robust Imitation Learning.
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries.
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

Multi-Agent Trajectory Prediction by Combining Egocentric and Allocentric Views.
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

2020
LID 2020: The Learning from Imperfect Data Challenge Results.
CoRR, 2020

AlignNet: A Unifying Approach to Audio-Visual Alignment.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Scalability in Perception for Autonomous Driving: Waymo Open Dataset.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Music Gesture for Visual Sound Separation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

TNT: Target-driven Trajectory Prediction.
Proceedings of the 4th Conference on Robot Learning, 2020

CLOUD: Contrastive Learning of Unsupervised Dynamics.
Proceedings of the 4th Conference on Robot Learning, 2020

Unsupervised Monocular Depth Learning in Dynamic Scenes.
Proceedings of the 4th Conference on Robot Learning, 2020

2019
Semantic Understanding of Scenes Through the ADE20K Dataset.
Int. J. Comput. Vis., 2019

Through-Wall Human Mesh Recovery Using Radio Signals.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

The Sound of Motions.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-Supervised Moving Vehicle Tracking With Stereo Sound.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-supervised Audio-visual Co-segmentation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Self-Supervised Segmentation and Source Separation on Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

2018
RF-based 3D skeletons.
Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, 2018

The Sound of Pixels.
Proceedings of the Computer Vision - ECCV 2018, 2018

Through-Wall Human Pose Estimation Using Radio Signals.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Loss Functions for Image Restoration With Neural Networks.
IEEE Trans. Computational Imaging, 2017

SLAC: A Sparsely Labeled Dataset for Action Classification and Localization.
CoRR, 2017


Open Vocabulary Scene Parsing.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Scene Parsing through ADE20K Dataset.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Occluded Imaging with Time-of-Flight Sensors.
ACM Trans. Graph., 2016

Semantic Understanding of Scenes through the ADE20K Dataset.
CoRR, 2016

2015
Loss Functions for Neural Networks for Image Processing.
CoRR, 2015

Unbounded High Dynamic Range Photography Using a Modulo Camera.
Proceedings of the 2015 IEEE International Conference on Computational Photography, 2015

2014
Sub-pixel Layout for Super-Resolution with Images in the Octic Group.
Proceedings of the Computer Vision - ECCV 2014, 2014

2013
Millimeter Wave Mobile Communications for 5G Cellular: It Will Work!
IEEE Access, 2013

28 GHz Angle of Arrival and Angle of Departure Analysis for Outdoor Cellular Communications Using Steerable Beam Antennas in New York City.
Proceedings of the 77th IEEE Vehicular Technology Conference, 2013

28 GHz millimeter wave cellular communication measurements for reflection and penetration loss in and around buildings in New York city.
Proceedings of IEEE International Conference on Communications, 2013

28 GHz propagation measurements for outdoor cellular communications using steerable beam antennas in New York city.
Proceedings of IEEE International Conference on Communications, 2013


  Loading...