Zhongang Cai

Orcid: 0000-0002-1810-3855

According to our database¹, Zhongang Cai authored at least 75 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2026

From Pixels to Words - Towards Native One-Vision Models at Scale.

[BibT_eX]

[DOI]

CoRR, May, 2026

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture.

[BibT_eX]

[DOI]

CoRR, May, 2026

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer.

[BibT_eX]

[DOI]

CoRR, March, 2026

Demystifing Video Reasoning.

[BibT_eX]

[DOI]

CoRR, March, 2026

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., February, 2026

A Very Big Video Reasoning Suite.

[BibT_eX]

[DOI]

CoRR, February, 2026

VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

ConsistCompose: Unified Multimodal Layout Control for Image Composition.

[BibT_eX]

[DOI]

CoRR, November, 2025

Scaling Spatial Intelligence with Multimodal Foundation Models.

[BibT_eX]

[DOI]

CoRR, November, 2025

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study.

[BibT_eX]

[DOI]

CoRR, August, 2025

Controllable Human-centric Keyframe Interpolation with Generative Prior.

[BibT_eX]

[DOI]

CoRR, June, 2025

ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization.

[BibT_eX]

[DOI]

CoRR, May, 2025

Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos.

[BibT_eX]

[DOI]

CoRR, January, 2025

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation.

[BibT_eX]

[DOI]

CoRR, January, 2025

ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DPoser-X: Diffusion Model as Robust 3D Whole-Body Human Pose Prior.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

EgoLife: Towards Egocentric Life Assistant.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Playing for 3D Human Recovery.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., June, 2024

Robust Partial-to-Partial Point Cloud Registration in a Full Range.

[BibT_eX]

[DOI]

Liang Pan

Zhongang Cai

Ziwei Liu

IEEE Robotics Autom. Lett., 2024

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image.

[BibT_eX]

[DOI]

CoRR, 2024

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Large Motion Model for Unified Multi-Modal Motion Generation.

[BibT_eX]

[DOI]

CoRR, 2024

HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh Recovery.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Large Motion Model for Unified Multi-modal Motion Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

WHAC: World-Grounded Humans and Cameras.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Digital Life Project: Autonomous 3D Characters with Social Intelligence.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Variational Relational Point Completion Network for Robust 3D Classification.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2023

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation.

[BibT_eX]

[DOI]

CoRR, 2023

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds.

[BibT_eX]

[DOI]

CoRR, 2023

Learning Dense UV Completion for Human Mesh Recovery.

[BibT_eX]

[DOI]

CoRR, 2023

DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering.

[BibT_eX]

[DOI]

CoRR, 2023

SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling.

[BibT_eX]

[DOI]

CoRR, 2023

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction.

[BibT_eX]

[DOI]

CoRR, 2023

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2023 Technical Communications, 2023

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Towards Robust and Expressive Whole-body Human Pose and Shape Estimation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

BiBench: Benchmarking and Analyzing Network Binarization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2022

Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Monocular 3D Object Reconstruction with GAN Inversion.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PTTR: Relational 3D Point Cloud Object Tracking with Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Versatile Multi-Modal Pre-Training for Human-Centric Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results.

[BibT_eX]

[DOI]

Francisco Gómez Fernández

Qinlong Wang

Yang Yang

CoRR, 2021

Playing for 3D Human Recovery.

[BibT_eX]

[DOI]

CoRR, 2021

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

[BibT_eX]

[DOI]

CoRR, 2021

Garment4D: Garment Reconstruction from Point Cloud Sequences.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

BiPointNet: Binary Neural Network for Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Unsupervised 3D Shape Completion Through GAN Inversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Variational Relational Point Completion Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

REFINE: Prediction Fusion Network for Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Balanced Activation for Long-tailed Visual Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Leveraging Localization for Multi-camera Association.

[BibT_eX]

[DOI]

CoRR, 2020

Leveraging Temporal Information for 3D Detection and Domain Adaptation.

[BibT_eX]

[DOI]

CoRR, 2020

MessyTable: Instance Association in Multiple Camera Views.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Siamese Convolutional Neural Network for Sub-millimeter-accurate Camera Pose Estimation and Visual Servoing.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

2018

3D Convolution on RGB-D Point Clouds for Accurate Model-free Object Pose Estimation.

[BibT_eX]

[DOI]

Zhongang Cai

Cunjun Yu

Quang-Cuong Pham

CoRR, 2018

Zhongang Cai

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...