Xiaowei Chi

Orcid: 0000-0002-6559-4378

According to our database¹, Xiaowei Chi authored at least 30 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Unimodal Training-Multimodal Prediction: Cross-Modal Federated Learning With Hierarchical Aggregation.

[BibT_eX]

[DOI]

IEEE Trans. Mob. Comput., October, 2025

WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Can World Models Benefit VLMs for World Dynamics?

[BibT_eX]

[DOI]

CoRR, October, 2025

WoW: Towards a World omniscient World model Through Embodied Interaction.

[BibT_eX]

[DOI]

CoRR, September, 2025

ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory.

[BibT_eX]

[DOI]

CoRR, September, 2025

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents.

[BibT_eX]

[DOI]

CoRR, June, 2025

MinD: Unified Visual Imagination and Control via Hierarchical World Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

BEVUDA++: Geometric-Aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., May, 2025

ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance.

[BibT_eX]

[DOI]

CoRR, April, 2025

MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation.

[BibT_eX]

[DOI]

CoRR, March, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency.

[BibT_eX]

[DOI]

CoRR, January, 2025

Empowering World Models with Reflection for Embodied Video Prediction.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., September, 2024

Large Motion Video Autoencoding with Cross-modal Video VAE.

[BibT_eX]

[DOI]

CoRR, 2024

EVA: An Embodied World Model for Future Video Anticipation.

[BibT_eX]

[DOI]

CoRR, 2024

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions.

[BibT_eX]

[DOI]

CoRR, 2024

M-LRM: Multi-view Large Reconstruction Model.

[BibT_eX]

[DOI]

CoRR, 2024

LLMs Meet Multimodal Generation and Editing: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.

[BibT_eX]

[DOI]

CoRR, 2024

BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-Speech Gesture Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical Aggregation.

[BibT_eX]

[DOI]

CoRR, 2023

BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Towards efficient full 8-bit integer DNN online training on resource-limited devices without batch normalization.

[BibT_eX]

[DOI]

Neurocomputing, 2022

Multi-latent Space Alignments for Unsupervised Domain Adaptation in Multi-view 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

Xiaowei Chi

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...