Hang Zhao

CoRR, February, 2025

Position: Prospective of Autonomous Driving - Multimodal LLMs, World Models, Embodied Intelligence, AI Alignment, and Mamba.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

SARO: Space-Aware Robot System for Terrain Crossing via Vision-Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Robust Robot Walker: Learning Agile Locomotion over Tiny Traps.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Chameleon: Fast-Slow Neuro-Symbolic Lane Topology Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

TrackOcc: Camera-Based 4D Panoptic Occupancy Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

2024

P-MapNet: Far-Seeing Map Generator Enhanced by Both SDMap and HDMap Priors.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., October, 2024

Playful DoggyBot: Learning Agile and Precise Quadrupedal Locomotion.

[BibT_eX]

[DOI]

CoRR, 2024

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Cross Anything: General Quadruped Robot Navigation through Complex Terrains.

[BibT_eX]

[DOI]

CoRR, 2024

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory.

[BibT_eX]

[DOI]

CoRR, 2024

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

LiDAR-based 4D Occupancy Completion and Forecasting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

Boosting Offline Reinforcement Learning for Autonomous Driving with Hierarchical Latent Skills.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Humanoid Parkour Learning.

[BibT_eX]

[DOI]

Shenzhe Yao

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

2023

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module.

[BibT_eX]

[DOI]

CoRR, 2023

Large Trajectory Models are Scalable Motion Predictors and Planners.

[BibT_eX]

[DOI]

CoRR, 2023

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference.

[BibT_eX]

[DOI]

CoRR, 2023

GPT-Driver: Learning to Drive with GPT.

[BibT_eX]

[DOI]

CoRR, 2023

AutoEncoding Tree for City Generation and Applications.

[BibT_eX]

[DOI]

CoRR, 2023

BEVScope: Enhancing Self-Supervised Depth Estimation Leveraging Bird's-Eye-View in Dynamic Scenarios.

[BibT_eX]

[DOI]

CoRR, 2023

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory.

[BibT_eX]

[DOI]

CoRR, 2023

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving.

[BibT_eX]

[DOI]

IROS, 2023

VectorMapNet: End-to-end Vectorized HD Map Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Programmatically Grounded, Compositionally Generalizable Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Self-supervision through Random Segments with Autoregressive Coding (RandSAC).

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

INT2: Interactive Trajectory Prediction at Intersections.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Neural Map Prior for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

What Happened 3 Seconds Ago? Inferring the Past with Thermal Imaging.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FUTR3D: A Unified Sensor Fusion Framework for 3D Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Robot Parkour Learning.

[BibT_eX]

[DOI]

Zipeng Fu

Christopher G. Atkeson

Sören Schwertfeger

Chelsea Finn

Proceedings of the Conference on Robot Learning, 2023

Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2023

A Universal Semantic-Geometric Representation for Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2023

Long-Term Interactive Driving Simulation: MPC to the Rescue.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

M<sup>2</sup>Sim: A Long-Term Interactive Driving Simulator.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

2022

AsyInst: Asymmetric Affinity with DepthGrad and Color for Box-Supervised Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2022

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction.

[BibT_eX]

[DOI]

CoRR, 2022

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech.

[BibT_eX]

[DOI]

CoRR, 2022

Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision.

[BibT_eX]

[DOI]

Lingyu Zhu

Esa Rahtu

CoRR, 2022

VectorMapNet: End-to-end Vectorized HD Map Learning.

[BibT_eX]

[DOI]

CoRR, 2022

The Modality Focusing Hypothesis: On the Blink of Multimodal Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2022

Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization.

[BibT_eX]

[DOI]

CoRR, 2022

Self-supervision through Random Segments with Autoregressive Coding (RandSAC).

[BibT_eX]

[DOI]

CoRR, 2022

InterSim: Interactive Traffic Simulation via Explicit Relation Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Intrinsically Motivated Self-supervised Learning in Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 2022 International Conference on Robotics and Automation, 2022

SEMI: Self-supervised Exploration via Multisensory Incongruity.

[BibT_eX]

[DOI]

Proceedings of the 2022 International Conference on Robotics and Automation, 2022

HDMapNet: An Online HD Map Construction and Evaluation Framework.

[BibT_eX]

[DOI]

Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Depth Estimation Matters Most: Improving Per-Object Depth Estimation for Monocular 3D Detection and Tracking.

[BibT_eX]

[DOI]

Proceedings of the 2022 International Conference on Robotics and Automation, 2022

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

R4D: Utilizing Reference Objects for Long-Range Distance Estimation.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation.

[BibT_eX]

[DOI]

Renhao Wang

Yang Gao

Proceedings of the Computer Vision - ECCV 2022, 2022

Learning Visual Styles from Audio-Visual Associations.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Co-advise: Cross Inductive Bias Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Embracing Single Stride 3D Object Detector with Sparse Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Egocentric Prediction of Action Target in 3D.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-training for Spatial-Aware Visual Representations.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Neural Dubber: Dubbing for Silent Videos According to Scripts.

[BibT_eX]

[DOI]

CoRR, 2021

DenseTNT: Waymo Open Dataset Motion Prediction Challenge 1st Place Solution.

[BibT_eX]

[DOI]

Junru Gu

Qiao Sun

CoRR, 2021

Improving Multi-Modal Learning with Uni-Modal Teachers.

[BibT_eX]

[DOI]

CoRR, 2021

What Makes Multimodal Learning Better than Single (Provably).

[BibT_eX]

[DOI]

CoRR, 2021

Predictive Visual Tracking: A New Benchmark and Baseline Approach.

[BibT_eX]

[DOI]

CoRR, 2021

What Makes Multi-Modal Learning Better than Single (Provably).

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Neural Dubber: Dubbing for Videos According to Scripts.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

CVC: Contrastive Learning for Non-Parallel Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multimodal Knowledge Expansion.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

On Feature Decorrelation in Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets.

[BibT_eX]

[DOI]

Junru Gu

Chen Sun

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Adversarially Robust Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

Multi-Agent Trajectory Prediction by Combining Egocentric and Allocentric Views.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

2020

LID 2020: The Learning from Imperfect Data Challenge Results.

[BibT_eX]

[DOI]

CoRR, 2020

AlignNet: A Unifying Approach to Audio-Visual Alignment.

[BibT_eX]

[DOI]

Zhaoyuan Fang

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Scalability in Perception for Autonomous Driving: Waymo Open Dataset.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Music Gesture for Visual Sound Separation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

TNT: Target-driven Trajectory Prediction.

[BibT_eX]

[DOI]

Balakrishnan Varadarajan

Proceedings of the 4th Conference on Robot Learning, 2020

CLOUD: Contrastive Learning of Unsupervised Dynamics.

[BibT_eX]

[DOI]

Yujie Lu

Proceedings of the 4th Conference on Robot Learning, 2020

Unsupervised Monocular Depth Learning in Dynamic Scenes.

[BibT_eX]

[DOI]

Proceedings of the 4th Conference on Robot Learning, 2020

2019

Semantic Understanding of Scenes Through the ADE20K Dataset.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2019

Through-Wall Human Mesh Recovery Using Radio Signals.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

The Sound of Motions.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-Supervised Moving Vehicle Tracking With Stereo Sound.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-supervised Audio-visual Co-segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Self-Supervised Segmentation and Source Separation on Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

2018

RF-based 3D skeletons.

[BibT_eX]

[DOI]

Mingmin Zhao

Yonglong Tian

Mohammad Abu Alsheikh

Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, 2018

The Sound of Pixels.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Through-Wall Human Pose Estimation Using Radio Signals.

[BibT_eX]

[DOI]

Mingmin Zhao

Tianhong Li

Mohammad Abu Alsheikh

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Loss Functions for Image Restoration With Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Computational Imaging, 2017

SLAC: A Sparsely Labeled Dataset for Action Classification and Localization.

[BibT_eX]

[DOI]

CoRR, 2017

Duckietown: An open, inexpensive and flexible platform for autonomy education and research.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Robotics and Automation, 2017

Open Vocabulary Scene Parsing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Scene Parsing through ADE20K Dataset.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Occluded Imaging with Time-of-Flight Sensors.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2016

Semantic Understanding of Scenes through the ADE20K Dataset.

[BibT_eX]

[DOI]

CoRR, 2016

2015

Loss Functions for Neural Networks for Image Processing.

[BibT_eX]

[DOI]

CoRR, 2015

Unbounded High Dynamic Range Photography Using a Modulo Camera.

[BibT_eX]

[DOI]

Boxin Shi

Christy Fernandez-Cull

Sai-Kit Yeung

Ramesh Raskar

Proceedings of the 2015 IEEE International Conference on Computational Photography, 2015

2014

Sub-pixel Layout for Super-Resolution with Images in the Octic Group.

[BibT_eX]

[DOI]

Christy Fernandez-Cull

R. Hamilton Shepard

Christopher Barsi

Ramesh Raskar

Proceedings of the Computer Vision - ECCV 2014, 2014

2013

Millimeter Wave Mobile Communications for 5G Cellular: It Will Work!

[BibT_eX]

[DOI]

IEEE Access, 2013

28 GHz Angle of Arrival and Angle of Departure Analysis for Outdoor Cellular Communications Using Steerable Beam Antennas in New York City.

[BibT_eX]

[DOI]

Proceedings of the 77th IEEE Vehicular Technology Conference, 2013

28 GHz millimeter wave cellular communication measurements for reflection and penetration loss in and around buildings in New York city.

[BibT_eX]

[DOI]

Proceedings of IEEE International Conference on Communications, 2013

28 GHz propagation measurements for outdoor cellular communications using steerable beam antennas in New York city.

[BibT_eX]

[DOI]