Xiaolong Wang

Orcid: 0000-0003-3150-778X

  • UC San Diego, CA, USA
  • UC Berkeley, CA, USA (former)
  • Carnegie Mellon University, Robotics Institute, Pittsburgh, PA, USA (former)
  • Sun Yat-Sen University, Guangzhou, China (former)

According to our database1, Xiaolong Wang authored at least 149 papers between 2011 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Lessons from Learning to Spin "Pens".
CoRR, 2024

Bunny-VisionPro: Real-Time Bimanual Dexterous Teleoperation for Imitation Learning.
CoRR, 2024

Open-TeleVision: Teleoperation with Immersive Active Visual Feedback.
CoRR, 2024

Image Neural Field Diffusion Models.
CoRR, 2024

Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment.
CoRR, 2024

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model.
CoRR, 2024

Hierarchical World Models as Visual Whole-Body Humanoid Controllers.
CoRR, 2024

Editable Image Elements for Controllable Synthesis.
CoRR, 2024

Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos.
CoRR, 2024

Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing.
CoRR, 2024

Visual Whole-Body Control for Legged Loco-Manipulation.
CoRR, 2024

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data.
CoRR, 2024

Learning Generalizable Feature Fields for Mobile Manipulation.
CoRR, 2024

DNAct: Diffusion Guided Multi-Task 3D Policy Learning.
CoRR, 2024

Expressive Whole-Body Control for Humanoid Robots.
CoRR, 2024

CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation.
CoRR, 2024

RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos.
CoRR, 2024

DexTouch: Learning to Seek and Manipulate Objects with Tactile Dexterity.
CoRR, 2024

A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Sim2Real Manipulation on Unknown Objects with Tactile-based Reinforcement Learning.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

GenSim: Generating Robotic Simulation Tasks via Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

3D Reconstruction with Generalizable Neural Fields using Scene Priors.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

TUVF: Learning Generalizable Texture UV Radiance Fields.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

TD-MPC2: Scalable, Robust World Models for Continuous Control.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ContactArt: Learning 3D Interaction Priors for Category-level Articulated Object and Hand Poses Estimation.
Proceedings of the International Conference on 3D Vision, 2024

Visual Reinforcement Learning With Self-Supervised 3D Representations.
IEEE Robotics Autom. Lett., May, 2023

Learning Continuous Grasping Function With a Dexterous Hand From Human Demonstrations.
IEEE Robotics Autom. Lett., May, 2023

Pixel Aligned Language Models.
CoRR, 2023

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis.
CoRR, 2023

COLMAP-Free 3D Gaussian Splatting.
CoRR, 2023

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks.
CoRR, 2023

Generalized Animal Imitator: Agile Locomotion with Versatile Motion Prior.
CoRR, 2023

Test-Time Training on Video Streams.
CoRR, 2023

Rotating without Seeing: Towards In-hand Dexterity through Touch.
Proceedings of the Robotics: Science and Systems XIX, Daegu, 2023

AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System.
Proceedings of the Robotics: Science and Systems XIX, Daegu, 2023

Elastic Decision Transformer.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Efficient Bimanual Handover and Rearrangement via Symmetry-Aware Actor-Critic Learning.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Learning Dense Correspondences between Photos and Sketches.
Proceedings of the International Conference on Machine Learning, 2023

MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses.
Proceedings of the International Conference on Machine Learning, 2023

On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline.
Proceedings of the International Conference on Machine Learning, 2023

Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Neural Volumetric Memory for Visual Locomotion Control.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Zero-shot Pose Transfer for Unrigged Stylized 3D Characters.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Policy Adaptation from Foundation Model Feedback.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields.
Proceedings of the Conference on Robot Learning, 2023

Dynamic Handover: Throw and Catch with Bimanual Hands.
Proceedings of the Conference on Robot Learning, 2023

Finetuning Offline World Models in the Real World.
Proceedings of the Conference on Robot Learning, 2023

Online Adaptation for Implicit Object Tracking and Shape Reconstruction in the Wild.
IEEE Robotics Autom. Lett., 2022

From One Hand to Multiple Hands: Imitation Learning for Dexterous Manipulation From Single-Camera Teleoperation.
IEEE Robotics Autom. Lett., 2022

Look Closer: Bridging Egocentric and Third-Person Views With Transformers for Robotic Manipulation.
IEEE Robotics Autom. Lett., 2022

Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models.
CoRR, 2022

Multiplane NeRF-Supervised Disentanglement of Depth and Camera Pose from Videos.
CoRR, 2022

Inverse Reinforcement Learning from Diverse Third-Person Videos via Graph Abstraction.
CoRR, 2022

Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset.
CoRR, 2022

Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

Temporal Difference Learning for Model Predictive Control.
Proceedings of the International Conference on Machine Learning, 2022

Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Learning Continuous Environment Fields via Implicit Functions.
Proceedings of the Tenth International Conference on Learning Representations, 2022

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos.
Proceedings of the Computer Vision - ECCV 2022, 2022

Scraping Textures from Natural Images for Synthesis and Editing.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learning Implicit Feature Alignment Function for Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Transformers as Meta-learners for Implicit Neural Representations.
Proceedings of the Computer Vision - ECCV 2022, 2022

GIFS: Neural Implicit Function for General Shape Representation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

GroupViT: Semantic Segmentation Emerges from Text Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning Generalizable Dexterous Manipulation from Human Grasp Affordance.
Proceedings of the Conference on Robot Learning, 2022

DexPoint: Generalizable Point Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation.
Proceedings of the Conference on Robot Learning, 2022

Graph Inverse Reinforcement Learning from Diverse Videos.
Proceedings of the Conference on Robot Learning, 2022

Single RGB-D Camera Teleoperation for General Robotic Manipulation.
CoRR, 2021

Disentangled Attention as Intrinsic Regularization for Bimanual Multi-Object Manipulation.
CoRR, 2021

NovelD: A Simple yet Effective Exploration Criterion.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multi-Person 3D Motion Prediction with Multi-Range Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Test-Time Personalization with a Transformer for Human Pose Estimation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

State-Only Imitation Learning for Dexterous Manipulation.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

Generalization in Reinforcement Learning by Soft Data Augmentation.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Compositional Video Synthesis with Action Graphs.
Proceedings of the 38th International Conference on Machine Learning, 2021

Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency.
Proceedings of the 9th International Conference on Learning Representations, 2021

What Should Not Be Contrastive in Contrastive Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization.
Proceedings of the 9th International Conference on Learning Representations, 2021

Learning Long-term Visual Dynamics with Region Proposal Interaction Networks.
Proceedings of the 9th International Conference on Learning Representations, 2021

Solving Compositional Reinforcement Learning Problems via Task Reduction.
Proceedings of the 9th International Conference on Learning Representations, 2021

Self-Supervised Policy Adaptation during Deployment.
Proceedings of the 9th International Conference on Learning Representations, 2021

Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Region Similarity Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Rethinking preventing class-collapsing in metric learning with margin-based losses.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Hand-Object Contact Consistency Reasoning for Human Grasps Generation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Robust Object Detection via Instance-Level Temporal Cycle Confusion.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Semi-Supervised 3D Hand-Object Poses Estimation With Interactions in Time.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Learning Continuous Image Representation With Local Implicit Image Function.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

BeBold: Exploration Beyond the Boundary of Explored Regions.
CoRR, 2020

Multi-Agent Collaboration via Reward Attribution Decomposition.
CoRR, 2020

Self-Supervised Policy Adaptation during Deployment.
CoRR, 2020

Compositional Video Synthesis with Action Graphs.
CoRR, 2020

Reducing Class Collapse in Metric Learning with Easy Positive Sampling.
CoRR, 2020

A New Meta-Baseline for Few-Shot Learning.
CoRR, 2020

Multi-Task Reinforcement Learning with Soft Modularization.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Online Adaptation for Consistent Mesh Reconstruction in the Wild.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Test-Time Training with Self-Supervision for Generalization under Distribution Shifts.
Proceedings of the 37th International Conference on Machine Learning, 2020

Deep Isometric Learning for Visual Recognition.
Proceedings of the 37th International Conference on Machine Learning, 2020

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Hierarchical Style-Based Networks for Motion Synthesis.
Proceedings of the Computer Vision - ECCV 2020, 2020

Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning and Reasoning with Visual Correspondence in Time.
PhD thesis, 2019

Test-Time Training for Out-of-Distribution Generalization.
CoRR, 2019

Joint-task Self-supervised Learning for Temporal Correspondence.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Visual Semantic Navigation using Scene Priors.
Proceedings of the 7th International Conference on Learning Representations, 2019

Spatio-Temporal Action Graph Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Learning Correspondence From the Cycle-Consistency of Time.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Interpretable Intuitive Physics Model.
Proceedings of the Computer Vision - ECCV 2018, 2018

Videos as Space-Time Region Graphs.
Proceedings of the Computer Vision - ECCV 2018, 2018

3D Human Pose Estimation in the Wild by Adversarial Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Non-Local Neural Networks.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Transitive Invariance for Self-Supervised Visual Representation Learning.
Proceedings of the IEEE International Conference on Computer Vision, 2017

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Binge Watching: Scaling Affordance Learning from Sitcoms.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Generative Image Modeling Using Style and Structure Adversarial Networks.
Proceedings of the Computer Vision - ECCV 2016, 2016

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding.
Proceedings of the Computer Vision - ECCV 2016, 2016

Actions ~ Transformations.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Discriminatively Trained And-Or Graph Models for Object Shape Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

In Defense of the Direct Perception of Affordances.
CoRR, 2015

Unsupervised Learning of Visual Representations Using Videos.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Designing deep networks for surface normal estimation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Deep Joint Task Learning for Generic Object Extraction.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

An expressive deep model for human action parsing from a single image.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Dynamical And-Or Graph Learning for Object Shape Modeling and Detection.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Learning contour-fragment-based shape model with And-Or tree representation.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

Interactive CT image segmentation with online discriminative learning.
Proceedings of the 18th IEEE International Conference on Image Processing, 2011
