Yichen Zhu

Orcid: 0000-0001-5126-838X

Affiliations:
  • Midea Group, AI Lab, Shanghai, Guangdong, China
  • University of Toronto, Department of Statistical Sciences, Canada


According to our database1, Yichen Zhu authored at least 54 papers between 2018 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training.
CoRR, April, 2026

PointVLA: Injecting the 3D World Into Vision-Language-Action Models.
IEEE Robotics Autom. Lett., March, 2026

2025
HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton.
CoRR, October, 2025

ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations.
CoRR, October, 2025

dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought.
CoRR, September, 2025

ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge.
CoRR, May, 2025

WorldEval: World Model as Real-World Robot Policies Evaluator.
CoRR, May, 2025

TinyVLA: Toward Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation.
IEEE Robotics Autom. Lett., April, 2025

PointVLA: Injecting the 3D World into Vision-Language-Action Models.
CoRR, March, 2025

ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration.
CoRR, February, 2025

ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model.
CoRR, February, 2025

DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control.
CoRR, February, 2025

LaTP: LiDAR-aided multimodal token pruning for efficient trajectory prediction of autonomous driving.
Neural Networks, 2025

Let Me Show You: Learning by Retrieving from Egocentric Video for Robotic Manipulation.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

DiffusionVLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

CoA-VLA: Improving Vision-Language-Action Models via Visual-Textual Chain-of-Affordance.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

A Comprehensive Overhaul of Multimodal Assistant with Small Language Models.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Improving Vision-Language-Action Models via Chain-of-Affordance.
CoRR, 2024

Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression.
CoRR, 2024

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation.
CoRR, 2024

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation.
CoRR, 2024

MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
CoRR, 2024

Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models.
CoRR, 2024

Visual Robotic Manipulation with Depth-Aware Pretraining.
CoRR, 2024

LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model.
CoRR, 2024

Visual Robotic Manipulation with Depth-Aware Pretraining.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2024

Any2Policy: Learning Visuomotor Policy with Any-Modality.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Safety of Multimodal Large Language Models on Images and Text.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Language-Conditioned Robotic Manipulation with Fast and Slow Thinking.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Object-Centric Instruction Augmentation for Robotic Manipulation.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Retrieval-Augmented Embodied Agents.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
LogSummary: Unstructured Log Summarization for Software Systems.
IEEE Trans. Netw. Serv. Manag., September, 2023

Query-Relevant Images Jailbreak Large Multi-Modal Models.
CoRR, 2023

Biglog: Unsupervised Large-scale Pre-training for a Unified Log Representation.
Proceedings of the 31st IEEE/ACM International Symposium on Quality of Service, 2023

ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
LogStamp: Automatic Online Log Parsing Based on Sequence Labelling.
SIGMETRICS Perform. Evaluation Rev., 2022

BNNAS++: Towards Unbiased Neural Architecture Search With Batch Normalization.
IEEE Access, 2022

Teach Less, Learn More: On the Undistillable Classes in Knowledge Distillation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Label-Guided Auxiliary Training Improves 3D Object Detector.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
UniLog: Deploy One Model and Specialize it for All Log Analysis Tasks.
CoRR, 2021

Make A Long Image Short: Adaptive Token Length for Vision Transformers.
CoRR, 2021

Training BatchNorm Only in Neural Architecture Search and Beyond.
CoRR, 2021

Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Summarizing Unstructured Logs in Online Services.
CoRR, 2020

LogParse: Making Log Parsing Adaptive through Word Classification.
Proceedings of the 29th International Conference on Computer Communications and Networks, 2020

2019
LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

2018
A Multi-scale Pyramid of Fully Convolutional Networks for Automatic Cell Detection.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2018


  Loading...