Youwei Zhuo

Orcid: 0000-0002-1557-2613

According to our database1, Youwei Zhuo authored at least 21 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Klotski v2: Improved DNN Model Orchestration Framework for Dataflow Architecture Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2025

TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems.
CoRR, March, 2025

2024
HydraRPC: RPC in the CXL Era.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

2023
Klotski: DNN Model Orchestration Framework for Dataflow Architecture Accelerators.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

2020
SympleGraph: distributed graph processing with precise loop-carried dependency guarantee.
Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020

AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee.
ACM Trans. Comput. Syst., 2019

Heterogeneity-Aware Asynchronous Decentralized Training.
CoRR, 2019

GraphQ: Scalable PIM-Based Graph Processing.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Hop: Heterogeneity-aware Decentralized Training.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
CSE: Parallel Finite State Machines with Convergence Set Enumeration.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

GraphR: Accelerating Graph Processing Using ReRAM.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices.
CoRR, 2017

CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Performance evaluation and optimization of HBM-Enabled GPU for data-intensive applications.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017


  Loading...