Zhen Zhang

Orcid: 0000-0002-0164-0849

According to our database¹, Zhen Zhang authored at least 13 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Verifying Computational Graphs in Production-Grade Distributed Machine Learning Frameworks.

[BibT_eX]

[DOI]

CoRR, September, 2025

Verifying Semantic Equivalence of Large Models with Equality Saturation.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Machine Learning and Systems, 2025

2024

DistMind: Efficient Resource Disaggregation for Deep Learning Workloads.

[BibT_eX]

[DOI]

IEEE/ACM Trans. Netw., June, 2024

SDCC: software-defined collective communication for distributed training.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

DISTMM: Accelerating Distributed Multimodal Model Training.

[BibT_eX]

[DOI]

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Distributed Training of Large Language Models on AWS Trainium.

[BibT_eX]

[DOI]

Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Decoupled Model Schedule for Deep Learning Training.

[BibT_eX]

[DOI]

CoRR, 2023

GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints.

[BibT_eX]

[DOI]

Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates.

[BibT_eX]

[DOI]

Proceedings of the 29th Symposium on Operating Systems Principles, 2023

2022

MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2022

2020

Is Network the Bottleneck of Distributed Training?

[BibT_eX]

[DOI]

Proceedings of the 2020 Workshop on Network Meets AI & ML, 2020

PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications.

[BibT_eX]

[DOI]

Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

Zhen Zhang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...