Xiaowei Chi

Orcid: 0000-0002-6559-4378

According to our database1, Xiaowei Chi authored at least 24 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents.
CoRR, June, 2025

MinD: Unified Visual Imagination and Control via Hierarchical World Models.
CoRR, June, 2025

BEVUDA++: Geometric-Aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection.
IEEE Trans. Circuits Syst. Video Technol., May, 2025

ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance.
CoRR, April, 2025

MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation.
CoRR, March, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.
CoRR, March, 2025

RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency.
CoRR, January, 2025

PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments.
IEEE Robotics Autom. Lett., September, 2024

Large Motion Video Autoencoding with Cross-modal Video VAE.
CoRR, 2024

EVA: An Embodied World Model for Future Video Anticipation.
CoRR, 2024

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion.
CoRR, 2024

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions.
CoRR, 2024

M-LRM: Multi-view Large Reconstruction Model.
CoRR, 2024

LLMs Meet Multimodal Generation and Editing: A Survey.
CoRR, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.
CoRR, 2024

BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-Speech Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024


2023
ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.
CoRR, 2023

Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical Aggregation.
CoRR, 2023

BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Towards efficient full 8-bit integer DNN online training on resource-limited devices without batch normalization.
Neurocomputing, 2022

Multi-latent Space Alignments for Unsupervised Domain Adaptation in Multi-view 3D Object Detection.
CoRR, 2022


  Loading...