Baoxiong Jia

Orcid: 0000-0002-4968-3290

According to our database1, Baoxiong Jia authored at least 51 papers between 2017 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes.
CoRR, October, 2025

Learning Human-Humanoid Coordination for Collaborative Object Carrying.
CoRR, October, 2025

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent.
CoRR, September, 2025

VideoArtGS: Building Digital Twins of Articulated Objects from Monocular Video.
CoRR, September, 2025

A VR-Based Robotic Teleoperation System With Haptic Feedback and Adaptive Collision Avoidance.
IEEE Trans. Consumer Electron., August, 2025

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation.
CoRR, August, 2025

Spatial-Temporal Multi-Scale Quantization for Flexible Motion Generation.
CoRR, August, 2025

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation.
CoRR, July, 2025

LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation.
CoRR, June, 2025

Learning Unified Force and Position Control for Legged Loco-Manipulation.
CoRR, May, 2025

RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning.
CoRR, April, 2025

ARFlow: Human Action-Reaction Flow Matching with Physical Guidance.
CoRR, March, 2025

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

PhysPart: Physically Plausible Part Completion for Interactable Objects.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

METASCENES: Towards Automated Replica Creation for Real-world 3D Scans.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Multi-modal Situated Reasoning in 3D Scenes.
CoRR, 2024

Task-oriented Sequential Grounding in 3D Scenes.
CoRR, 2024

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V.
CoRR, 2024

Multi-modal Situated Reasoning in 3D Scenes.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

An Embodied Generalist Agent in 3D World.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Unifying 3D Vision-Language Understanding via Promptable Queries.
Proceedings of the Computer Vision - ECCV 2024, 2024

SlotLifter: Slot-Guided Feature Lifting for Learning Object-Centric Radiance Fields.
Proceedings of the Computer Vision - ECCV 2024, 2024

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Move as you Say, Interact as you can: Language-Guided Human Motion Generation with Scene Affordance.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning a Causal Transition Model for Object Cutting.
IROS, 2023

Improving Object-centric Learning with Query Optimization.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Diffusion-based Generation, Optimization, and Planning in 3D Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation.
CoRR, 2022

Unsupervised Object-Centric Learning with Bi-Level Optimized Query Slot Attention.
CoRR, 2022

Latent Diffusion Energy-Based Model for Interpretable Text Modeling.
CoRR, 2022

EgoTaskQA: Understanding Human Tasks in Egocentric Videos.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Latent Diffusion Energy-Based Model for Interpretable Text Modelling.
Proceedings of the International Conference on Machine Learning, 2022

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
A Generalized Earley Parser for Human Activity Parsing and Prediction.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

ACRE: Abstract Causal REasoning Beyond Covariation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Human Activity Understanding and Prediction with Stochastic Grammar.
PhD thesis, 2019

Learning Perceptual Inference by Contrasting.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

RAVEN: A Dataset for Relational and Analogical Visual REasoNing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction.
Proceedings of the 35th International Conference on Machine Learning, 2018

Learning Human-Object Interactions by Graph Parsing Neural Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

2017
Mining User Reviews for Mobile App Comparisons.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 2017


  Loading...