Siyu Zhu

Orcid: 0000-0003-0293-0044

Affiliations:

Alibaba Group, A. I. Labs, Hangzhou, China
Hong Kong University of Science and Engineering, Department of Computer Science and Engineering, Hong Kong (PhD 2017)

According to our database¹, Siyu Zhu authored at least 94 papers between 2014 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

SlotMemory: Object-Centric KV Memory for Streaming Long-Video Generation.

[BibT_eX]

[DOI]

CoRR, May, 2026

Large Depth Completion Model from Sparse Observations.

[BibT_eX]

[DOI]

CoRR, May, 2026

Towards Consistent Video Geometry Estimation.

[BibT_eX]

[DOI]

CoRR, May, 2026

The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents.

[BibT_eX]

[DOI]

CoRR, April, 2026

BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation.

[BibT_eX]

[DOI]

CoRR, April, 2026

Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence.

[BibT_eX]

[DOI]

CoRR, April, 2026

CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image.

[BibT_eX]

[DOI]

CoRR, March, 2026

Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, February, 2026

Linguistic query-guided mask generation for referring image segmentation.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

2025

WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, December, 2025

WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, December, 2025

OmniMotion: Multimodal Motion Generation with Continuous Masked Autoregression.

[BibT_eX]

[DOI]

CoRR, October, 2025

LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing.

[BibT_eX]

[DOI]

CoRR, September, 2025

Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos.

[BibT_eX]

[DOI]

CoRR, September, 2025

PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images.

[BibT_eX]

[DOI]

CoRR, June, 2025

DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration.

[BibT_eX]

[DOI]

CoRR, June, 2025

Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation.

[BibT_eX]

[DOI]

CoRR, May, 2025

MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

Text-video retrieval re-ranking via multi-grained cross attention and frozen image encoders.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2025 Conference Papers, 2025

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Tora: Trajectory-oriented Diffusion Transformer for Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

4D Diffusion for Dynamic Protein Structure Prediction with Reference and Motion Guidance.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Open-Vocabulary Category-Level Object Pose and Size Estimation.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., September, 2024

MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Native Generative Model for 3D Head Avatar.

[BibT_eX]

[DOI]

CoRR, 2024

4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures.

[BibT_eX]

[DOI]

CoRR, 2024

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°.

[BibT_eX]

[DOI]

CoRR, 2024

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation.

[BibT_eX]

[DOI]

CoRR, 2024

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance.

[BibT_eX]

[DOI]

CoRR, 2024

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation.

[BibT_eX]

[DOI]

CoRR, 2024

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model.

[BibT_eX]

[DOI]

CoRR, 2024

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360$^\circ $.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

DRO: Deep Recurrent Optimizer for Video to Depth.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., May, 2023

Fine-Grained Open Domain Image Animation with Motion Guidance.

[BibT_eX]

[DOI]

CoRR, 2023

Fine-grained Text-Video Retrieval with Frozen Image Encoders.

[BibT_eX]

[DOI]

CoRR, 2023

UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Robust Video Instance Segmentation with Temporal-Aware Transformer.

[BibT_eX]

[DOI]

CoRR, 2023

Learning Aligned Cross-modal Representations for Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Monocular Scene Reconstruction with 3D SDF Transformers.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments.

[BibT_eX]

[DOI]

CoRR, 2022

RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds.

[BibT_eX]

[DOI]

CoRR, 2022

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation.

[BibT_eX]

[DOI]

CoRR, 2022

Quadtree Attention for Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Neural Window Fully-connected CRFs for Monocular Depth Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RCP: Recurrent Closest Point for Point Cloud.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cluster Contrast for Unsupervised Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2022, 2022

GB-CosFace: Rethinking Softmax-Based Face Recognition from the Perspective of Open Set Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2022, 2022

2021

UniFuse: Unidirectional Fusion for 360° Panorama Depth Estimation.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2021

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification.

[BibT_eX]

[DOI]

CoRR, 2021

AR Mapping: Accurate and Efficient Mapping for Augmented Reality.

[BibT_eX]

[DOI]

CoRR, 2021

Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment.

[BibT_eX]

[DOI]

CoRR, 2021

DRO: Deep Recurrent Optimizer for Structure-from-Motion.

[BibT_eX]

[DOI]

CoRR, 2021

UniFuse: Unidirectional Fusion for 360<sup>°</sup> Panorama Depth Estimation.

[BibT_eX]

[DOI]

CoRR, 2021

Stereo Matching by Self-supervision of Multiscopic Vision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

Single-Shot is Enough: Panoramic Infrastructure Based Calibration of Multiple Cameras and 3D LiDARs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol Spotting.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Camera Localization via Dense Scene Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

MeshMVS: Multi-View Stereo Guided Mesh Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2021

2020

Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2020

Self-Supervised Human Depth Estimation From Monocular Videos.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

End-to-End Learning Local Multi-View Descriptors for 3D Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

A Neural Network for Detailed Human Depth Estimation From a Single Image.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Batch DropBlock Network for Person Re-Identification and Beyond.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018

Batch Feature Erasing for Person Re-identification and Beyond.

[BibT_eX]

[DOI]

CoRR, 2018

Learning and Matching Multi-View Descriptors for Registration of Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Very Large-Scale Global SfM by Distributed Motion Averaging.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Matchable Image Retrieval by Learning from Surface Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2018, 2018

2017

Accurate, Scalable and Parallel Structure from Motion.

[BibT_eX]

[DOI]

CoRR, 2017

Progressive Large Scale-Invariant Image Matching in Scale Space.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Relative Camera Refinement for Accurate Dense Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on 3D Vision, 2017

2016

Image-Based Building Regularization Using Structural Linear Features.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., 2016

Graph-Based Consistent Matching for Structure-from-Motion.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Color Correction for Image-Based Modeling in the Large.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2016, 2016

2015

Joint Camera Clustering and Surface Segmentation for Large-Scale Multi-view Stereo.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

2014

Local Readjustment for High-Resolution 3D Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Multi-view Geometry Compression.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2014, 2014

Multi-scale Tetrahedral Fusion of a Similarity Reconstruction and Noisy Positional Measurements.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2014, 2014

Siyu Zhu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...