Youwei Zhuo

Orcid: 0000-0002-1557-2613

According to our database¹, Youwei Zhuo authored at least 27 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference.

[BibT_eX]

[DOI]

CoRR, May, 2026

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing.

[BibT_eX]

[DOI]

CoRR, April, 2026

CoCoTree: A Computation-Capable Architecture for Collective Communication in Scalable PIM.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

2025

SlowPoke: Understanding and Detecting On-Chip Fail-Slow Failures in Many-Core Systems.

[BibT_eX]

[DOI]

CoRR, October, 2025

Tokencake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications.

[BibT_eX]

[DOI]

CoRR, October, 2025

Klotski v2: Improved DNN Model Orchestration Framework for Dataflow Architecture Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2025

Tasa: Thermal-aware 3D-Stacked Architecture Design with Bandwidth Sharing for LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2025

TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems.

[BibT_eX]

[DOI]

Proceedings of the Advanced Parallel Processing Technologies, 2025

2024

HydraRPC: RPC in the CXL Era.

[BibT_eX]

[DOI]

Proceedings of the 2024 USENIX Annual Technical Conference, 2024

2023

Klotski: DNN Model Orchestration Framework for Dataflow Architecture Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

2020

SympleGraph: distributed graph processing with precise loop-carried dependency guarantee.

[BibT_eX]

[DOI]

Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020

AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2019

Heterogeneity-Aware Asynchronous Decentralized Training.

[BibT_eX]

[DOI]

CoRR, 2019

GraphQ: Scalable PIM-Based Graph Processing.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Hop: Heterogeneity-aware Decentralized Training.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

CSE: Parallel Finite State Machines with Convergence Set Enumeration.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

GraphR: Accelerating Graph Processing Using ReRAM.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices.

[BibT_eX]

[DOI]

CoRR, 2017

CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Performance evaluation and optimization of HBM-Enabled GPU for data-intensive applications.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

Youwei Zhuo

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...