Yanghua Peng

Orcid: 0000-0003-3989-4358

According to our database¹, Yanghua Peng authored at least 39 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

veScale-FSDP: Flexible and High-Performance FSDP at Scale.

[BibT_eX]

[DOI]

CoRR, February, 2026

MegaScale-Data: Scaling DataLoader for Multisource Large Foundation Model Training.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

Laminar: A Scalable Asynchronous RL Post-Training Framework.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

OmniScale: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Mesh-Attention: A New Communication-Efficient Distributed Attention with Improved Data Locality.

[BibT_eX]

[DOI]

CoRR, December, 2025

veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD.

[BibT_eX]

[DOI]

CoRR, September, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo.

[BibT_eX]

[DOI]

CoRR, August, 2025

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.

[BibT_eX]

[DOI]

CoRR, May, 2025

OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training.

[BibT_eX]

[DOI]

CoRR, April, 2025

Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation.

[BibT_eX]

[DOI]

Proceedings of the 2025 USENIX Annual Technical Conference, 2025

Robust LLM Training Infrastructure at ByteDance.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development.

[BibT_eX]

[DOI]

Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

HybridFlow: A Flexible and Efficient RLHF Framework.

[BibT_eX]

[DOI]

Proceedings of the Twentieth European Conference on Computer Systems, 2025

Goku: Flow Based Video Generative Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2025

2024

ByteCheckpoint: A Unified Checkpointing System for LLM Development.

[BibT_eX]

[DOI]

CoRR, 2024

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.

[BibT_eX]

[DOI]

CoRR, 2024

POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

MegaScale: Scaling Large Language Model Training to More Than 10, 000 GPUs.

[BibT_eX]

[DOI]

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs.

[BibT_eX]

[DOI]

Proceedings of the Nineteenth European Conference on Computer Systems, 2024

2023

Deep Learning-Based Job Placement in Distributed Machine Learning Clusters With Heterogeneous Workloads.

[BibT_eX]

[DOI]

Yixin Bao

Yanghua Peng

Chuan Wu

IEEE/ACM Trans. Netw., April, 2023

SP-GNN: Learning structure and position information from graphs.

[BibT_eX]

[DOI]

Neural Networks, April, 2023

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing.

[BibT_eX]

[DOI]

Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

2022

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training.

[BibT_eX]

[DOI]

CoRR, 2022

Multi-resource interleaving for deep learning training.

[BibT_eX]

[DOI]

Proceedings of the SIGCOMM '22: ACM SIGCOMM 2022 Conference, Amsterdam, The Netherlands, August 22, 2022

SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training.

[BibT_eX]

[DOI]

Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

2021

DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

2020

Preemptive All-reduce Scheduling for Expediting Distributed DNN Training.

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE Conference on Computer Communications, 2020

Elastic parameter server load distribution in deep learning clusters.

[BibT_eX]

[DOI]

Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

2019

A generic communication scheduler for distributed DNN training acceleration.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

Deep Learning-based Job Placement in Distributed Machine Learning Clusters.

[BibT_eX]

[DOI]

Yixin Bao

Yanghua Peng

Chuan Wu

Proceedings of the 2019 IEEE Conference on Computer Communications, 2019

2018

Online Job Scheduling in Distributed Machine Learning Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Communications, 2018

Optimus: an efficient dynamic resource scheduler for deep learning clusters.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth EuroSys Conference, 2018

2017

Dynamic Scaling of Virtualized, Distributed Service Chains: A Case Study of IMS.

[BibT_eX]

[DOI]

IEEE J. Sel. Areas Commun., 2017

deTector: a Topology-aware Monitoring System for Data Center Networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 USENIX Annual Technical Conference, 2017

Yanghua Peng

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...