Guan Huang

Affiliations:

XForwardAI, Beijing, China
PhiGent Robotics
Chinese Academy of Sciences (CASIA), Institute of Automation, China (2016)

According to our database¹, Guan Huang authored at least 63 papers between 2015 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

GigaBrain-0: A World Model-Powered Vision-Language-Action Model.

[BibT_eX]

[DOI]

CoRR, October, 2025

DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion.

[BibT_eX]

[DOI]

CoRR, October, 2025

EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer.

[BibT_eX]

[DOI]

CoRR, September, 2025

MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training.

[BibT_eX]

[DOI]

CoRR, September, 2025

ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction.

[BibT_eX]

[DOI]

CoRR, August, 2025

EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling.

[BibT_eX]

[DOI]

CoRR, July, 2025

Gait Recognition in the Wild: A Large-Scale Benchmark and NAS-Based Baseline.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., June, 2025

WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration.

[BibT_eX]

[DOI]

CoRR, June, 2025

GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, June, 2025

Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer.

[BibT_eX]

[DOI]

CoRR, May, 2025

HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration.

[BibT_eX]

[DOI]

CoRR, April, 2025

WonderTurbo: Generating Interactive 3D World in 0.72 Seconds.

[BibT_eX]

[DOI]

CoRR, April, 2025

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation.

[BibT_eX]

[DOI]

CoRR, March, 2025

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond.

[BibT_eX]

[DOI]

CoRR, 2024

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

DriveDreamer: Towards Real-World-Drive World Models for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

WebFace260M: A Benchmark for Million-Scale Deep Face Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2023

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

Detachable Novel Views Synthesis of Dynamic Scenes Using Distribution-Driven Neural Radiance Fields.

[BibT_eX]

[DOI]

CoRR, 2023

HFT: Lifting Perspective Representations via Hybrid Feature Transformation for BEV Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Efficient and Hybrid Decoder for Local Map Construction in Bird'-Eye-View.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CompletionFormer: Depth Completion with Convolutions and Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

A Simple Baseline for Multi-Camera 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment.

[BibT_eX]

[DOI]

Junjie Huang

Guan Huang

CoRR, 2022

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2022

Gait Recognition in the Wild: A Benchmark.

[BibT_eX]

[DOI]

CoRR, 2022

HFT: Lifting Perspective Representations via Hybrid Feature Transformation.

[BibT_eX]

[DOI]

CoRR, 2022

BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection.

[BibT_eX]

[DOI]

Junjie Huang

Guan Huang

CoRR, 2022

MVSTER: Epipolar Transformer for Efficient Multi-view Stereo.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Dimension Embeddings for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

CAFE: Learning to Condense Dataset by Aligning Features.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2022

MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2022

2021

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View.

[BibT_eX]

[DOI]

CoRR, 2021

Face-NMS: A Core-set Selection Approach for Efficient Face Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Masked Face Recognition Challenge: The WebFace260M Track Report.

[BibT_eX]

[DOI]

CoRR, 2021

Structure-Aware Face Clustering on a Large-Scale Graph with $\bf{10^{7}}$ Nodes.

[BibT_eX]

[DOI]

CoRR, 2021

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Structure-Aware Face Clustering on a Large-Scale Graph With 107 Nodes.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Improve Person Re-Identification With Part Awareness Learning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

How to Train Your Robust Human Pose Estimator: Pay Attention to the Constraint Cue.

[BibT_eX]

[DOI]

CoRR, 2020

The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Action Machine: Toward Person-Centric Action Recognition in Videos.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2019

Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation.

[BibT_eX]

[DOI]

Junjie Huang

Zheng Zhu

Guan Huang

CoRR, 2019

High Performance Visual Object Tracking with Unified Convolutional Networks.

[BibT_eX]

[DOI]

CoRR, 2019

FastPose: Towards Real-time Pose Estimation and Tracking via Scale-normalized Multi-task Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Exploiting Offset-guided Network for Pose Estimation and Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

State-Aware Re-Identification Feature for Multi-Target Multi-Camera Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Attention-Guided Unified Network for Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

EANet: Enhancing Alignment for Cross-Domain Person Re-identification.

[BibT_eX]

[DOI]

CoRR, 2018

Action Machine: Rethinking Action Recognition in Trimmed Videos.

[BibT_eX]

[DOI]

CoRR, 2018

2017

UCT: Learning Unified Convolutional Networks for Real-Time Visual Tracking.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

The Visual Object Tracking VOT2017 Challenge Results.

[BibT_eX]

[DOI]

Abdelrahman Eldesokey

Alireza Memarmoghadam

Gorthi R. K. Sai Subrahmanyam

Goutam Bhat

Guan Huang

Guilherme Sousa Bastos

Kannappan Palaniappan

Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

2015

Automatic gear sorting system based on monocular vision.

[BibT_eX]

[DOI]

Digit. Commun. Networks, 2015

Guan Huang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...