We stand with Ukraine

We stand with Ukraine

Wenzhao Zheng

Orcid: 0000-0001-7188-3734

According to our database¹, Wenzhao Zheng authored at least 92 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Latent Diffusion Model without Variational Autoencoder.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, October, 2025

SaLon3R: Structure-aware Long-term Generalizable 3D Reconstruction from Unposed Images.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

Terra: Explorable Native 3D World Model with Point Latents.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, October, 2025

D<sup>3</sup>QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

4D Driving Scene Generation With Stereo Forcing.

[BibT_eX]

[DOI]

,

,

Guangfeng Jiang

,

,

,

,

,

,

CoRR, September, 2025

StereoCarla: A High-Fidelity Driving Dataset for Generalizable Stereo.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, September, 2025

ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, August, 2025

Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, August, 2025

Streaming 4D Visual Geometry Transformer.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, July, 2025

Quantize-then-Rectify: Efficient VQ-VAE Training.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, July, 2025

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory.

[BibT_eX]

[DOI]

,

,

,

CoRR, July, 2025

Learning Counterfactually Decoupled Attention for Open-World Model Attribution.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, June, 2025

ODE<sub>t</sub> (ODE<sub>l</sub>): Shortcutting the Time and Length in Diffusion and Flow Models for Faster Sampling.

[BibT_eX]

[DOI]

Denis A. Gudovskiy

,

,

,

,

CoRR, June, 2025

SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, June, 2025

QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, June, 2025

GenWorld: Towards Detecting AI-generated Real-world Simulation Videos.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, June, 2025

SpectralAR: Spectral Autoregressive Visual Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, June, 2025

R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation.

[BibT_eX]

[DOI]

William Ljungbergh

,

Bernardo Taveira

,

,

,

,

,

Christoffer Petersson

,

Michael Felsberg

,

,

Masayoshi Tomizuka

,

CoRR, June, 2025

S2GO: Streaming Sparse Gaussian Occupancy Prediction.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, June, 2025

OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, June, 2025

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Shanghang Zhang

CoRR, May, 2025

OmniIndoor3D: Comprehensive Indoor 3D Reconstruction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Shanghang Zhang

CoRR, May, 2025

EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Shanghang Zhang

CoRR, April, 2025

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2025

SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Shanghang Zhang

CoRR, January, 2025

GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, January, 2025

LiDAR-HMR: 3D Human Mesh Recovery From LiDAR.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Multim., 2025

Probabilistic deep metric learning for hyperspectral image classification.

[BibT_eX]

[DOI]

,

,

,

,

Pattern Recognit., 2025

SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Shanghang Zhang

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Lightstereo: Channel Boost is All You Need for Efficient 2D Cost Aggregation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Denis A. Gudovskiy

,

,

,

,

Shanghang Zhang

Proceedings of the Forty-second International Conference on Machine Learning, 2025

UniDrive: Towards Universal Driving Perception Across Camera Configurations.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Masayoshi Tomizuka

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Segment Any Motion in Videos.

[BibT_eX]

[DOI]

,

,

,

,

Shanghang Zhang

,

Angjoo Kanazawa

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction.

[BibT_eX]

[DOI]

,

Amonnut Thammatadatrakoon

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

SPTR: Structure-Preserving Transformer for Unsupervised Indoor Depth Completion.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., April, 2024

Introspective Deep Metric Learning.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Multim., 2024

StructLane: Leveraging Structural Relations for Lane Detection.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Trans. Image Process., 2024

Preventing Local Pitfalls in Vector Quantization via Optimal Transport.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

GaussianAD: Gaussian-Centric End-to-End Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Shanghang Zhang

CoRR, 2024

Doe-1: Closed-Loop Autonomous Driving with Large World Model.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Owl-1: Omni World Model for Consistent Long Video Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Masayoshi Tomizuka

,

,

CoRR, 2024

GPD-1: Generative Pre-training for Driving.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Shanghang Zhang

CoRR, 2024

Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

,

Masayoshi Tomizuka

,

,

CoRR, 2024

Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Shanghang Zhang

CoRR, 2024

EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Training-free Regional Prompting for Diffusion Transformers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Shanghang Zhang

CoRR, 2024

PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views.

[BibT_eX]

[DOI]

,

,

,

,

Masayoshi Tomizuka

,

,

CoRR, 2024

V2M: Visual 2-Dimensional Mamba for Image Representation Learning.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2024

GlobalMamba: Global Image Serialization for Vision Mamba.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Shanghang Zhang

CoRR, 2024

LightStereo: Channel Boost Is All Your Need for Efficient 2D Cost Aggregation.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Instruct Large Language Models to Drive like Humans.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

S<sup>3</sup>Gaussian: Self-Supervised Street Gaussians for Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Masayoshi Tomizuka

,

,

Shanghang Zhang

CoRR, 2024

GenAD: Generative End-to-End Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

Path Choice Matters for Clear Attribution in Path Methods.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

Path Choice Matters for Clear Attributions in Path Methods.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

GenAD: Generative End-to-End Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-Based 3D Semantic Occupancy Prediction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Deep Metric Learning With Adaptively Composite Dynamic Constraints.

[BibT_eX]

[DOI]

,

,

IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Exploring Unified Perspective For Fast Shapley Value Estimation.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2023

Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Token-Label Alignment for Vision Transformers.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Deep Factorized Metric Learning.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

A Simple Baseline for Multi-Camera 3D Object Detection.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2022

Dynamic Metric Learning with Cross-Level Concept Distillation.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Computer Vision, 2022

Dimension Embeddings for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Attributable Visual Similarity Learning.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Conference on Robot Learning, 2022

2021

Hardness-Aware Deep Metric Learning.

[BibT_eX]

[DOI]

,

,

IEEE Trans. Pattern Anal. Mach. Intell., 2021

Deep Relational Metric Learning.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Deep Compositional Metric Learning.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Deep Adversarial Metric Learning.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Image Process., 2020

Structural Deep Metric Learning for Room Layout Estimation.

[BibT_eX]

[DOI]

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

Deep Metric Learning via Adaptive Learnable Assessment.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Hardness-Aware Deep Metric Learning.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Deep Adversarial Metric Learning.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Loading...