Yinghao Yu

Orcid: 0000-0002-2744-845X

According to our database1, Yinghao Yu authored at least 38 papers between 2016 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
RTP-LLM: High-Performance Alibaba LLM Inference Engine.
CoRR, May, 2026

LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows.
CoRR, April, 2026

Dissecting Outlier Dynamics in LLM NVFP4 Pretraining.
CoRR, February, 2026

Defrag: Reducing Resource Fragmentation in Large-Scale Heterogeneous GPU Clusters.
IEEE Trans. Netw., 2026

Attack of the Bubbles: Straggler-Resilient Pipeline Parallelism for Large Model Training.
Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

Medley: Optimizing Midgress Bandwidth for Commercial Live Streaming CDNs.
Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

eGPU: Production-Scale Elastic Sharing Over 10,000 GPUs.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

FlashPS: Efficient Generative Image Editing with Mask-aware Caching and Scheduling.
Proceedings of the 21st European Conference on Computer Systems, 2026

GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management.
Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025
Diving into 3D Parallelism with Heterogeneous Spot Instance GPUs: Design and Implications.
CoRR, December, 2025

RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training.
CoRR, December, 2025

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation.
CoRR, November, 2025

EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training.
CoRR, November, 2025

InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling.
CoRR, May, 2025

Adaptra: Straggler-Resilient Hybrid-Parallel Training with Pipeline Adaptation.
CoRR, April, 2025

GREYHOUND: Hunting Fail-Slows in Hybrid-Parallel Training at Scale.
Proceedings of the 2025 USENIX Annual Technical Conference, 2025

Katz: Efficient Workflow Serving for Diffusion Models with Many Adapters.
Proceedings of the 2025 USENIX Annual Technical Conference, 2025

GPU-Disaggregated Serving for Deep Learning Recommendation Models at Scale.
Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

Reducing the End-to-End Latency of DNN-Based Recommendation Systems in GPU Pools.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

2024
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training.
CoRR, 2024

SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules.
CoRR, 2024

2023
Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

2022
Missing Data Repairs for Traffic Flow With Self-Attention Generative Adversarial Imputation Net.
IEEE Trans. Intell. Transp. Syst., 2022

Towards Dependency-Aware Cache Management for Data Analytics Applications.
IEEE Trans. Cloud Comput., 2022

MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022

Workload consolidation in alibaba clusters: the good, the bad, and the ugly.
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

2021
Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

George: Learning to Place Long-Lived Containers in Large Clusters with Operation Constraints.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

2020
Achieving Load-Balanced, Redundancy-Free Cluster Caching with Selective Partition.
IEEE Trans. Parallel Distributed Syst., 2020

A Wireless Magnetic Resonance Device for Optogenetic Applications in an Animal Model.
Sensors, 2020

RepBun: Load-Balanced, Shuffle-Free Cluster Caching for Structured Data.
Proceedings of the 39th IEEE Conference on Computer Communications, 2020

2019
LACS: Load-Aware Cache Sharing with Isolation Guarantee.
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019

2018
SP-cache: load-balanced, redundancy-free cluster caching with selective partition.
Proceedings of the International Conference for High Performance Computing, 2018

OpuS: Fair and Efficient Cache Sharing for In-Memory Data Analytics.
Proceedings of the 38th IEEE International Conference on Distributed Computing Systems, 2018

2017
LRC: Dependency-aware cache management for data analytics clusters.
Proceedings of the 2017 IEEE Conference on Computer Communications, 2017

LERC: Coordinated Cache Management for Data-Parallel Systems.
Proceedings of the 2017 IEEE Global Communications Conference, 2017

2016
Flow-Level QoE of Video Streaming in Wireless Networks.
IEEE Trans. Mob. Comput., 2016

Joint Subcarrier and CPU Time Allocation for Mobile Edge Computing.
Proceedings of the 2016 IEEE Global Communications Conference, 2016


  Loading...