We stand with Ukraine

We stand with Ukraine

Siyuan Huang

Orcid: 0000-0003-1524-7148

Affiliations:

Beijing Institute for General Artificial Intelligence (BIGAI), China
University of California, Los Angeles, CA, USA (PhD 2021)

According to our database¹, Siyuan Huang authored at least 106 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2026

Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation.

[DOI]

,

,

,

,

,

,

IEEE Robotics Autom. Lett., April, 2026

Lifting Unlabeled Internet-level Data for 3D Scene Understanding.

[DOI]

,

,

,

,

,

Jiangyong Huang

,

,

,

,

,

,

CoRR, April, 2026

OmniClone: Engineering a Robust, All-Rounder Whole-Body Humanoid Teleoperation System.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2026

3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding.

[DOI]

Xiongkun Linghu

,

Jiangyong Huang

,

,

CoRR, March, 2026

OmniXtreme: Breaking the Generality Barrier in High-Dynamic Humanoid Control.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, February, 2026

OmniTrack: General Motion Tracking via Physics-Consistent Reference.

[DOI]

,

,

,

,

,

,

,

,

CoRR, February, 2026

LessMimic: Long-Horizon Humanoid Interaction with Unified Distance Field Representations.

[DOI]

,

,

,

,

,

CoRR, February, 2026

GaussianFluent: Gaussian Simulation for Dynamic Scenes with Mixed Materials.

[DOI]

,

,

,

,

,

,

CoRR, January, 2026

3D Scene Change Modeling With Consistent Multi-View Aggregation.

[DOI]

,

,

,

,

Proceedings of the International Conference on 3D Visio, 2026

2025

UniAct: Unified Motion Generation and Action Streaming for Humanoid Robots.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, December, 2025

SafeFall: Learning Protective Control for Humanoid Robots.

[DOI]

,

,

,

,

,

,

CoRR, November, 2025

Learning Human-Humanoid Coordination for Collaborative Object Carrying.

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior.

[DOI]

,

,

,

,

,

,

CoRR, October, 2025

VideoArtGS: Building Digital Twins of Articulated Objects from Monocular Video.

[DOI]

,

,

,

,

,

,

,

CoRR, September, 2025

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation.

[DOI]

,

,

,

,

,

,

CoRR, August, 2025

Spatial-Temporal Multi-Scale Quantization for Flexible Motion Generation.

[DOI]

,

,

,

,

,

CoRR, August, 2025

DreamArt: Generating Interactable Articulated Objects from a Single Image.

[DOI]

,

,

,

,

,

,

,

,

CoRR, July, 2025

ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, June, 2025

LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation.

[DOI]

Jiangyong Huang

,

,

Xiongkun Linghu

,

,

,

,

,

,

,

,

CoRR, June, 2025

CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks.

[DOI]

,

,

,

,

,

,

CoRR, June, 2025

Learning Unified Force and Position Control for Legged Loco-Manipulation.

[DOI]

,

,

,

,

CoRR, May, 2025

RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning.

[DOI]

,

,

,

,

,

,

Charlie Tianyue Cheng

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Carlo Sferrazza

,

,

,

,

,

CoRR, April, 2025

ARFlow: Human Action-Reaction Flow Matching with Physical Guidance.

[DOI]

,

,

,

,

,

,

CoRR, March, 2025

StyleLoco: Generative Adversarial Distillation for Natural Humanoid Robot Locomotion.

[DOI]

,

,

,

,

,

,

CoRR, March, 2025

Generating Objects with Part-Articulation from a Single Image.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the SIGGRAPH Asia 2025 Conference Papers, 2025

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation.

[DOI]

,

,

,

,

,

,

Chenfanfu Jiang

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset.

[DOI]

,

,

,

,

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Ag2x2: Robust Agent-Agnostic Visual Representations for Zero-Shot Bimanual Manipulation.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

PhysPart: Physically Plausible Part Completion for Interactable Objects.

[DOI]

,

,

,

,

,

,

Leonidas J. Guibas

,

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

SYNERGAI: Perception Alignment for Human-Robot Collaboration.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting.

[DOI]

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

TACO: Taming Diffusion for In-the-Wild Video Amodal Completion.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

PrimHOI: Compositional Human-Object Interaction via Reusable Primitives.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

METASCENES: Towards Automated Replica Creation for Real-world 3D Scans.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding.

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Dynamic Motion Blending for Versatile Motion Editing.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis.

[DOI]

Jiangyong Huang

,

,

,

,

Xiongkun Linghu

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Grasp Multiple Objects With One Hand.

[DOI]

,

,

,

,

,

,

,

IEEE Robotics Autom. Lett., May, 2024

Task-oriented Sequential Grounding in 3D Scenes.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

PhyRecon: Physically Plausible Neural Scene Reconstruction.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Autonomous Character-Scene Interaction Synthesis from Text Instruction.

[DOI]

,

,

,

,

,

,

Proceedings of the SIGGRAPH Asia 2024 Conference Papers, 2024

PhyRecon: Physically Plausible Neural Scene Reconstruction.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-modal Situated Reasoning in 3D Scenes.

[DOI]

Xiongkun Linghu

,

Jiangyong Huang

,

,

Xiaojian (Shawn) Ma

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

3D Vision and Language Pretraining with Large-Scale Synthetic Data.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

An Embodied Generalist Agent in 3D World.

[DOI]

Jiangyong Huang

,

,

,

Xiongkun Linghu

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Neural-Symbolic Recursive Machine for Systematic Generalization.

[DOI]

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unifying 3D Vision-Language Understanding via Promptable Queries.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

F-HOI: Toward Fine-Grained Semantic-Aligned 3D Human-Object Interactions.

[DOI]

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

SlotLifter: Slot-Guided Feature Lifting for Learning Object-Centric Radiance Fields.

[DOI]

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI.

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Move as you Say, Interact as you can: Language-Guided Human Motion Generation with Scene Affordance.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Scaling Up Dynamic Human-Scene Interaction Modeling.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture.

[DOI]

,

,

,

,

,

Proceedings of the International Conference on 3D Vision, 2024

2023

ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab.

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

GenDexGrasp: Generalizable Dexterous Grasping.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

SQA3D: Situated Question Answering in 3D Scenes.

[DOI]

,

,

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Improving Object-centric Learning with Query Optimization.

[DOI]

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics.

[DOI]

,

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Full-Body Articulated Human-Object Interaction.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes.

[DOI]

,

Jiangyong Huang

,

,

,

,

,

,

,

Demetri Terzopoulos

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Diffusion-based Generation, Optimization, and Planning in 3D Scenes.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

CHAIRS: Towards Full-Body Articulated Human-Object Interaction.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation.

[DOI]

Jiangyong Huang

,

William Yicheng Zhu

,

,

,

,

,

CoRR, 2022

Unsupervised Object-Centric Learning with Bi-Level Optimized Query Slot Attention.

[DOI]

,

,

CoRR, 2022

PartAfford: Part-level Affordance Discovery from 3D Objects.

[DOI]

,

,

,

,

,

CoRR, 2022

HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

EgoTaskQA: Understanding Human Tasks in Egocentric Videos.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning V1 Simple Cells with Vector Representation of Local Content and Matrix Representation of Local Motion.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Human-like Holistic 3D Scene Understanding.

[DOI]

PhD thesis, 2021

A Generalized Earley Parser for Human Activity Parsing and Prediction.

[DOI]

,

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., 2021

A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics.

[DOI]

,

,

,

,

,

CoRR, 2021

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds.

[DOI]

,

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

VLGrammar: Grounded Grammar Induction of Vision and Language.

[DOI]

,

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

YouRefIt: Embodied Reference Understanding with Language and Gesture.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis.

[DOI]

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning.

[DOI]

,

,

,

,

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

SMART: A Situation Model for Algebra Story Problems via Attributed Grammar.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Learning by Fixing: Solving Math Word Problems with Weak Supervision.

[DOI]

,

,

,

,

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense.

[DOI]

,

,

,

,

,

,

,

,

,

,

Joshua B. Tenenbaum

,

CoRR, 2020

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning.

[DOI]

,

,

,

,

,

Proceedings of the 37th International Conference on Machine Learning, 2020

A Competence-Aware Curriculum for Visual Concepts Learning via Question Answering.

[DOI]

,

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities.

[DOI]

,

,

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning.

[DOI]

,

,

,

,

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense.

[DOI]

,

,

,

,

,

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars.

[DOI]

Chenfanfu Jiang

,

,

,

,

,

,

Demetri Terzopoulos

,

Int. J. Comput. Vis., 2018

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image.

[DOI]

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2018, 2018

Human-Centric Indoor Scene Synthesis Using Stochastic Grammar.

[DOI]

,

,

,

Chenfanfu Jiang

,

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Configurable, Photorealistic Image Rendering and Ground Truth Synthesis by Sampling Stochastic Grammars Representing Indoor Scenes.

[DOI]

Chenfanfu Jiang

,

,

,

,

,

,

,

Demetri Terzopoulos

,

CoRR, 2017

Predicting Human Activities Using Stochastic Grammar.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Computer Vision, 2017

Loading...