Minchen Yu

Orcid: 0000-0002-6797-9028

According to our database¹, Minchen Yu authored at least 23 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling.

[BibT_eX]

[DOI]

CoRR, May, 2026

GoodServe: Towards High-Goodput Serving of Agentic LLM Inferences over Heterogeneous Resources.

[BibT_eX]

[DOI]

CoRR, May, 2026

Efficient Data Passing for Serverless Inference Workflows: A GPU-Centric Approach.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

2025

Janus: Disaggregating Attention and Experts for Scalable MoE Inference.

[BibT_eX]

[DOI]

CoRR, December, 2025

MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse.

[BibT_eX]

[DOI]

CoRR, July, 2025

Making Serverless Computing Extensible: A Case Study of Serverless Data Analytics.

[BibT_eX]

[DOI]

CoRR, July, 2025

SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, March, 2025

λScale: Enabling Fast Scaling for Serverless Large Language Model Inference.

[BibT_eX]

[DOI]

CoRR, February, 2025

Pheromone: Restructuring Serverless Computing With Data-Centric Function Orchestration.

[BibT_eX]

[DOI]

IEEE Trans. Netw., 2025

Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference.

[BibT_eX]

[DOI]

Proceedings of the 2025 USENIX Annual Technical Conference, 2025

Toppings: CPU-Assisted, Rank-Aware Adapter Serving for LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the 2025 USENIX Annual Technical Conference, 2025

AdaSpec: Adaptive Speculative Decoding for Fast, SLO-Aware Large Language Model Serving.

[BibT_eX]

[DOI]

Proceedings of the 2025 ACM Symposium on Cloud Computing, 2025

2024

FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing.

[BibT_eX]

[DOI]

CoRR, 2024

CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference.

[BibT_eX]

[DOI]

CoRR, 2024

2023

FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping.

[BibT_eX]

[DOI]

CoRR, 2023

Following the Data, Not the Function: Rethinking Function Orchestration in Serverless Computing.

[BibT_eX]

[DOI]

Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

2022

Enabling Cost-Effective, SLO-Aware Machine Learning Inference Serving on Public Cloud.

[BibT_eX]

[DOI]

IEEE Trans. Cloud Comput., 2022

2021

Restructuring Serverless Computing with Data-Centric Function Orchestration.

[BibT_eX]

[DOI]

CoRR, 2021

CrystalPerf: Learning to Characterize the Performance of Dataflow Computation through Code Analysis.

[BibT_eX]

[DOI]

Huangshi Tian

Minchen Yu

Wei Wang

Proceedings of the 2021 USENIX Annual Technical Conference, 2021

Gillis: Serving Large Neural Networks in Serverless Functions with Automatic Model Partitioning.

[BibT_eX]

[DOI]

Proceedings of the 41st IEEE International Conference on Distributed Computing Systems, 2021

2020

RepBun: Load-Balanced, Shuffle-Free Cluster Caching for Structured Data.

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE Conference on Computer Communications, 2020

2019

MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving.

[BibT_eX]

[DOI]

Proceedings of the 2019 USENIX Annual Technical Conference, 2019

2018

Continuum: A Platform for Cost-Aware, Low-Latency Continual Learning.

[BibT_eX]

[DOI]

Huangshi Tian

Minchen Yu

Wei Wang

Proceedings of the ACM Symposium on Cloud Computing, 2018

Minchen Yu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...