Siyuan Huang

Orcid: 0000-0003-1524-7148

Affiliations:
  • Beijing Institute for General Artificial Intelligence (BIGAI), China
  • University of California, Los Angeles, CA, USA (PhD 2021)


According to our database1, Siyuan Huang authored at least 83 papers between 2017 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Spatial-Temporal Multi-Scale Quantization for Flexible Motion Generation.
CoRR, August, 2025

Ag2x2: Robust Agent-Agnostic Visual Representations for Zero-Shot Bimanual Manipulation.
CoRR, July, 2025

DreamArt: Generating Interactable Articulated Objects from a Single Image.
CoRR, July, 2025

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation.
CoRR, July, 2025

ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models.
CoRR, June, 2025

LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation.
CoRR, June, 2025

CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks.
CoRR, June, 2025

Learning Unified Force and Position Control for Legged Loco-Manipulation.
CoRR, May, 2025

RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning.
CoRR, April, 2025

Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation.
CoRR, April, 2025

ARFlow: Human Action-Reaction Flow Matching with Physical Guidance.
CoRR, March, 2025

StyleLoco: Generative Adversarial Distillation for Natural Humanoid Robot Locomotion.
CoRR, March, 2025

TACO: Taming Diffusion for in-the-wild Video Amodal Completion.
CoRR, March, 2025

Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

METASCENES: Towards Automated Replica Creation for Real-world 3D Scans.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Dynamic Motion Blending for Versatile Motion Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Grasp Multiple Objects With One Hand.
IEEE Robotics Autom. Lett., May, 2024

SYNERGAI: Perception Alignment for Human-Robot Collaboration.
CoRR, 2024

PhysPart: Physically Plausible Part Completion for Interactable Objects.
CoRR, 2024

Task-oriented Sequential Grounding in 3D Scenes.
CoRR, 2024

PhyRecon: Physically Plausible Neural Scene Reconstruction.
CoRR, 2024

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V.
CoRR, 2024

Autonomous Character-Scene Interaction Synthesis from Text Instruction.
Proceedings of the SIGGRAPH Asia 2024 Conference Papers, 2024

PhyRecon: Physically Plausible Neural Scene Reconstruction.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-modal Situated Reasoning in 3D Scenes.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

An Embodied Generalist Agent in 3D World.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Neural-Symbolic Recursive Machine for Systematic Generalization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unifying 3D Vision-Language Understanding via Promptable Queries.
Proceedings of the Computer Vision - ECCV 2024, 2024

F-HOI: Toward Fine-Grained Semantic-Aligned 3D Human-Object Interactions.
Proceedings of the Computer Vision - ECCV 2024, 2024

SlotLifter: Slot-Guided Feature Lifting for Learning Object-Centric Radiance Fields.
Proceedings of the Computer Vision - ECCV 2024, 2024

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Move as you Say, Interact as you can: Language-Guided Human Motion Generation with Scene Affordance.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Scaling Up Dynamic Human-Scene Interaction Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture.
Proceedings of the International Conference on 3D Vision, 2024

2023
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

GenDexGrasp: Generalizable Dexterous Grasping.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

SQA3D: Situated Question Answering in 3D Scenes.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Improving Object-centric Learning with Query Optimization.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Full-Body Articulated Human-Object Interaction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Diffusion-based Generation, Optimization, and Planning in 3D Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
CHAIRS: Towards Full-Body Articulated Human-Object Interaction.
CoRR, 2022

Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation.
CoRR, 2022

Unsupervised Object-Centric Learning with Bi-Level Optimized Query Slot Attention.
CoRR, 2022

PartAfford: Part-level Affordance Discovery from 3D Objects.
CoRR, 2022

HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

EgoTaskQA: Understanding Human Tasks in Egocentric Videos.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning V1 Simple Cells with Vector Representation of Local Content and Matrix Representation of Local Motion.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Human-like Holistic 3D Scene Understanding.
PhD thesis, 2021

A Generalized Earley Parser for Human Activity Parsing and Prediction.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics.
CoRR, 2021

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

VLGrammar: Grounded Grammar Induction of Vision and Language.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

YouRefIt: Embodied Reference Understanding with Language and Gesture.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

SMART: A Situation Model for Algebra Story Problems via Attributed Grammar.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Learning by Fixing: Solving Math Word Problems with Weak Supervision.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense.
CoRR, 2020

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning.
Proceedings of the 37th International Conference on Machine Learning, 2020

A Competence-Aware Curriculum for Visual Concepts Learning via Question Answering.
Proceedings of the Computer Vision - ECCV 2020, 2020

LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018
Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars.
Int. J. Comput. Vis., 2018

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image.
Proceedings of the Computer Vision - ECCV 2018, 2018

Human-Centric Indoor Scene Synthesis Using Stochastic Grammar.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Configurable, Photorealistic Image Rendering and Ground Truth Synthesis by Sampling Stochastic Grammars Representing Indoor Scenes.
CoRR, 2017

Predicting Human Activities Using Stochastic Grammar.
Proceedings of the IEEE International Conference on Computer Vision, 2017


  Loading...