Jiangmiao Pang

Orcid: 0000-0002-6711-9319

According to our database¹, Jiangmiao Pang authored at least 116 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views.

[BibT_eX]

[DOI]

ACM Trans. Graph., December, 2025

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT.

[BibT_eX]

[DOI]

CoRR, October, 2025

Humanoid Goalkeeper: Learning from Position Conditioned Task-Motion Constraints.

[BibT_eX]

[DOI]

CoRR, October, 2025

ChangingGrounding: 3D Visual Grounding in Changing Scenes.

[BibT_eX]

[DOI]

CoRR, October, 2025

Towards Adaptable Humanoid Control via Adaptive Motion Tracking.

[BibT_eX]

[DOI]

CoRR, October, 2025

Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning.

[BibT_eX]

[DOI]

CoRR, October, 2025

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy.

[BibT_eX]

[DOI]

CoRR, October, 2025

PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System.

[BibT_eX]

[DOI]

CoRR, October, 2025

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning.

[BibT_eX]

[DOI]

CoRR, October, 2025

X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model.

[BibT_eX]

[DOI]

CoRR, October, 2025

ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation.

[BibT_eX]

[DOI]

CoRR, October, 2025

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning.

[BibT_eX]

[DOI]

CoRR, September, 2025

A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Behavior Foundation Model for Humanoid Robots.

[BibT_eX]

[DOI]

CoRR, September, 2025

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling.

[BibT_eX]

[DOI]

CoRR, September, 2025

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts.

[BibT_eX]

[DOI]

CoRR, September, 2025

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions.

[BibT_eX]

[DOI]

CoRR, September, 2025

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies.

[BibT_eX]

[DOI]

CoRR, August, 2025

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds.

[BibT_eX]

[DOI]

CoRR, August, 2025

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization.

[BibT_eX]

[DOI]

CoRR, August, 2025

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs.

[BibT_eX]

[DOI]

CoRR, July, 2025

Yume: An Interactive World Generation Model.

[BibT_eX]

[DOI]

CoRR, July, 2025

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation.

[BibT_eX]

[DOI]

CoRR, July, 2025

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting.

[BibT_eX]

[DOI]

CoRR, July, 2025

π3: Scalable Permutation-Equivariant Visual Geometry Learning.

[BibT_eX]

[DOI]

CoRR, July, 2025

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities.

[BibT_eX]

[DOI]

CoRR, July, 2025

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding.

[BibT_eX]

[DOI]

CoRR, July, 2025

UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots.

[BibT_eX]

[DOI]

CoRR, July, 2025

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling.

[BibT_eX]

[DOI]

CoRR, July, 2025

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Sekai: A Video Dataset towards World Exploration.

[BibT_eX]

[DOI]

CoRR, June, 2025

DeepVerse: 4D Autoregressive Video Generation as a World Model.

[BibT_eX]

[DOI]

CoRR, June, 2025

RoboDuet: Learning a Cooperative Policy for Whole-Body Legged Loco-Manipulation.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., May, 2025

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence.

[BibT_eX]

[DOI]

CoRR, May, 2025

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents.

[BibT_eX]

[DOI]

CoRR, May, 2025

MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation.

[BibT_eX]

[DOI]

CoRR, May, 2025

GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes.

[BibT_eX]

[DOI]

CoRR, May, 2025

HaloGS: Loose Coupling of Compact Geometry and Gaussian Splats for 3D Scenes.

[BibT_eX]

[DOI]

CoRR, May, 2025

TeleOpBench: A Simulator-Centric Benchmark for Dual-Arm Dexterous Teleoperation.

[BibT_eX]

[DOI]

CoRR, May, 2025

NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance.

[BibT_eX]

[DOI]

CoRR, May, 2025

Gripper Keypose and Object Pointflow as Interfaces for Bimanual Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, April, 2025

Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation.

[BibT_eX]

[DOI]

CoRR, April, 2025

Aether: Geometric-Aware Unified World Modeling.

[BibT_eX]

[DOI]

CoRR, March, 2025

Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems.

[BibT_eX]

[DOI]

AgiBot-World-Contributors

CoRR, March, 2025

VB-Com: Learning Vision-Blind Composite Humanoid Locomotion Against Deficient Perception.

[BibT_eX]

[DOI]

CoRR, February, 2025

HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit.

[BibT_eX]

[DOI]

CoRR, February, 2025

BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds.

[BibT_eX]

[DOI]

CoRR, February, 2025

Re3Sim: Generating High-Fidelity Simulation Data via 3D-Photorealistic Real-to-Sim for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, February, 2025

Learning Humanoid Standing-up Control across Diverse Postures.

[BibT_eX]

[DOI]

CoRR, February, 2025

A Unified and General Humanoid Whole-Body Controller for Fine-Grained Locomotion.

[BibT_eX]

[DOI]

CoRR, February, 2025

Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection.

[BibT_eX]

[DOI]

CoRR, February, 2025

Position-Guided Point Cloud Panoptic Segmentation Transformer.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., January, 2025

Towards Latency-Aware 3D Streaming Perception for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Learning Humanoid Locomotion with Perceptive Internal Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting.

[BibT_eX]

[DOI]

ACM Trans. Graph., December, 2024

Transformer-Based Visual Segmentation: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness.

[BibT_eX]

[DOI]

CoRR, 2024

GRUtopia: Dream General Robots in a City at Scale.

[BibT_eX]

[DOI]

CoRR, 2024

OVExp: Open Vocabulary Exploration for Object-Oriented Navigation.

[BibT_eX]

[DOI]

CoRR, 2024

Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights.

[BibT_eX]

[DOI]

CoRR, 2024

Grounded 3D-LLM with Referent Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

RoboDuet: A Framework Affording Mobile-Manipulation and Cross-Embodiment.

[BibT_eX]

[DOI]

CoRR, 2024

Mixed Gaussian Flow for Diverse Trajectory Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MGF: Mixed Gaussian Flow for Diverse Trajectory Prediction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

X-neuron: Interpreting, Locating and Editing of Neurons in Reinforcement Learning Policy.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Multi-Object Tracking by Hierarchical Visual Representations.

[BibT_eX]

[DOI]

Jinkun Cao

Jiangmiao Pang

Kris Kitani

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Unified Human-Scene Interaction via Prompted Chain-of-Contacts.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

PointLLM: Empowering Large Language Models to Understand Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

Learning H-Infinity Locomotion Control.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

2023

QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Self-Adversarial Disentangling for Specific Domain Adaptation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Context-Aware Mixup for Domain Adaptive Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., February, 2023

Understanding Masked Autoencoders From a Local Contrastive Perspective.

[BibT_eX]

[DOI]

CoRR, 2023

Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

OV-PARTS: Towards Open-Vocabulary Part Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Dense Distinct Query for End-to-End Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2023

2022

What Are Expected Queries in End-to-End Object Detection?

[BibT_eX]

[DOI]

CoRR, 2022

Dense Siamese Network.

[BibT_eX]

[DOI]

CoRR, 2022

Dense Siamese Network for Dense Unsupervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Monocular 3D Object Detection with Depth from Motion.

[BibT_eX]

[DOI]

Tai Wang

Jiangmiao Pang

Dahua Lin

Proceedings of the Computer Vision - ECCV 2022, 2022

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Towards Balanced Learning for Instance Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2021

Self-Adversarial Disentangling for Specific Domain Adaptation.

[BibT_eX]

[DOI]

CoRR, 2021

K-Net: Towards Unified Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Seesaw Loss for Long-Tailed Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Quasi-Dense Similarity Learning for Multiple Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Probabilistic and Geometric Depth: Detecting Objects in Perspective.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

2020

Quasi-Dense Instance Similarity Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Side-Aware Boundary Localization for More Precise Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

ℛ 2-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images.

[BibT_eX]

[DOI]

IEEE Trans. Geosci. Remote. Sens., 2019

MMDetection: Open MMLab Detection Toolbox and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2019

$\mathcal{R}^2$-CNN: Fast Tiny Object Detection in Large-scale Remote Sensing Images.

[BibT_eX]

[DOI]

CoRR, 2019

Adapting Object Detectors via Selective Cross-Domain Alignment.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Libra R-CNN: Towards Balanced Learning for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Hybrid Task Cascade for Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Jiangmiao Pang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...