Yinghao Yu

Orcid: 0000-0002-2744-845X

According to our database¹, Yinghao Yu authored at least 38 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

RTP-LLM: High-Performance Alibaba LLM Inference Engine.

[BibT_eX]

[DOI]

CoRR, May, 2026

LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows.

[BibT_eX]

[DOI]

CoRR, April, 2026

Dissecting Outlier Dynamics in LLM NVFP4 Pretraining.

[BibT_eX]

[DOI]

CoRR, February, 2026

Defrag: Reducing Resource Fragmentation in Large-Scale Heterogeneous GPU Clusters.

[BibT_eX]

[DOI]

IEEE Trans. Netw., 2026

Attack of the Bubbles: Straggler-Resilient Pipeline Parallelism for Large Model Training.

[BibT_eX]

[DOI]

Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

Medley: Optimizing Midgress Bandwidth for Commercial Live Streaming CDNs.

[BibT_eX]

[DOI]

Haiping Wang

Wanxin Shi

Sandesh Dhawaskar Sathyanarayana

Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

eGPU: Production-Scale Elastic Sharing Over 10,000 GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

FlashPS: Efficient Generative Image Editing with Mask-aware Caching and Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025

Diving into 3D Parallelism with Heterogeneous Spot Instance GPUs: Design and Implications.

[BibT_eX]

[DOI]

CoRR, December, 2025

RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training.

[BibT_eX]

[DOI]

CoRR, December, 2025

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation.

[BibT_eX]

[DOI]

CoRR, November, 2025

EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training.

[BibT_eX]

[DOI]

CoRR, November, 2025

InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling.

[BibT_eX]

[DOI]

CoRR, May, 2025

Adaptra: Straggler-Resilient Hybrid-Parallel Training with Pipeline Adaptation.

[BibT_eX]

[DOI]

CoRR, April, 2025

GREYHOUND: Hunting Fail-Slows in Hybrid-Parallel Training at Scale.

[BibT_eX]

[DOI]

Proceedings of the 2025 USENIX Annual Technical Conference, 2025

Katz: Efficient Workflow Serving for Diffusion Models with Many Adapters.

[BibT_eX]

[DOI]

Proceedings of the 2025 USENIX Annual Technical Conference, 2025

GPU-Disaggregated Serving for Deep Learning Recommendation Models at Scale.

[BibT_eX]

[DOI]

Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

Reducing the End-to-End Latency of DNN-Based Recommendation Systems in GPU Pools.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

2024

FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training.

[BibT_eX]

[DOI]

CoRR, 2024

SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the 2023 USENIX Annual Technical Conference, 2023

2022

Missing Data Repairs for Traffic Flow With Self-Attention Generative Adversarial Imputation Net.

[BibT_eX]

[DOI]

Salvatore Antonio Biancardo

Junyi Zhang

IEEE Trans. Intell. Transp. Syst., 2022

Towards Dependency-Aware Cache Management for Data Analytics Applications.

[BibT_eX]

[DOI]

IEEE Trans. Cloud Comput., 2022

MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022

Workload consolidation in alibaba clusters: the good, the bad, and the ugly.

[BibT_eX]

[DOI]

Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

2021

Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving.

[BibT_eX]

[DOI]

Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

George: Learning to Place Long-Lived Containers in Large Clusters with Operation Constraints.

[BibT_eX]

[DOI]

Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

2020

Achieving Load-Balanced, Redundancy-Free Cluster Caching with Selective Partition.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

A Wireless Magnetic Resonance Device for Optogenetic Applications in an Animal Model.

[BibT_eX]

[DOI]

Arthur C. Tsai

Andrew Chih Wei Huang

Sensors, 2020

RepBun: Load-Balanced, Shuffle-Free Cluster Caching for Structured Data.

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE Conference on Computer Communications, 2020

2019

LACS: Load-Aware Cache Sharing with Isolation Guarantee.

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019

2018

SP-cache: load-balanced, redundancy-free cluster caching with selective partition.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

OpuS: Fair and Efficient Cache Sharing for In-Memory Data Analytics.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Distributed Computing Systems, 2018

2017

LRC: Dependency-aware cache management for data analytics clusters.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Communications, 2017

LERC: Coordinated Cache Management for Data-Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Global Communications Conference, 2017

2016

Flow-Level QoE of Video Streaming in Wireless Networks.

[BibT_eX]

[DOI]

Yuedong Xu

Salah-Eddine Elayoubi

Eitan Altman

Rachid El Azouzi

Yinghao Yu

IEEE Trans. Mob. Comput., 2016

Joint Subcarrier and CPU Time Allocation for Mobile Edge Computing.

[BibT_eX]

[DOI]

Yinghao Yu

Jun Zhang

Khaled Ben Letaief

Proceedings of the 2016 IEEE Global Communications Conference, 2016

Yinghao Yu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...