Zhen Zhang

Orcid: 0000-0002-0164-0849

According to our database1, Zhen Zhang authored at least 12 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Verifying Semantic Equivalence of Large Models with Equality Saturation.
Proceedings of the 5th Workshop on Machine Learning and Systems, 2025

2024
DistMind: Efficient Resource Disaggregation for Deep Learning Workloads.
IEEE/ACM Trans. Netw., June, 2024

SDCC: software-defined collective communication for distributed training.
Sci. China Inf. Sci., 2024

DISTMM: Accelerating Distributed Multimodal Model Training.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Distributed Training of Large Language Models on AWS Trainium.
Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Decoupled Model Schedule for Deep Learning Training.
CoRR, 2023

GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud.
Proc. VLDB Endow., 2022

2020
Is Network the Bottleneck of Distributed Training?
Proceedings of the 2020 Workshop on Network Meets AI & ML, 2020

PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020


  Loading...