Liping Zhang

Orcid: 0000-0003-2334-3471

Affiliations:
  • Alibaba Group, Hangzhou, China


According to our database1, Liping Zhang authored at least 25 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management.
CoRR, September, 2025

InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling.
CoRR, May, 2025

Adaptra: Straggler-Resilient Hybrid-Parallel Training with Pipeline Adaptation.
CoRR, April, 2025

GREYHOUND: Hunting Fail-Slows in Hybrid-Parallel Training at Scale.
Proceedings of the 2025 USENIX Annual Technical Conference, 2025

Katz: Efficient Workflow Serving for Diffusion Models with Many Adapters.
Proceedings of the 2025 USENIX Annual Technical Conference, 2025

GPU-Disaggregated Serving for Deep Learning Recommendation Models at Scale.
Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

Reducing the End-to-End Latency of DNN-Based Recommendation Systems in GPU Pools.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

EXIST: Enabling Extremely Efficient Intra-Service Tracing Observability in Datacenters.
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024
Optimizing Resource Management for Shared Microservices: A Scalable System Design.
ACM Trans. Comput. Syst., May, 2024

FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training.
CoRR, 2024

SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules.
CoRR, 2024

DeployFix: Dynamic Repair of Software Deployment Failures via Constraint Solving.
Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024

2023
Practice of Alibaba Cloud on Elastic Resource Provisioning for Large-scale Microservices Cluster.
CoRR, 2023

Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

Understanding and Optimizing Workloads for Unified Resource Management in Large Cloud Platforms.
Proceedings of the Eighteenth European Conference on Computer Systems, 2023

Erms: Efficient Resource Management for Shared Microservices with SLA Guarantees.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
An In-Depth Study of Microservice Call Graph and Runtime Performance.
IEEE Trans. Parallel Distributed Syst., 2022

MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022

Cache Antagonists Identification: A Practice from Alibaba Colocation Datacenter.
Proceedings of the IEEE International Symposium on Software Reliability Engineering Workshops, 2022

Characterizing Job Microarchitectural Profiles at Scale: Dataset and Analysis.
Proceedings of the 51st International Conference on Parallel Processing, 2022

Workload consolidation in alibaba clusters: the good, the bad, and the ugly.
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

The power of prediction: microservice auto scaling via workload learning.
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

2021
Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

Characterizing Microservice Dependency and Performance: Alibaba Trace Analysis.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

2018
All-Spark: Using Simulation Tests Directly in Production Environments to Detect System Bottlenecks in Large-Scale Systems.
Proceedings of the 19th International Middleware Conference, 2018


  Loading...