Xuechao Wei

Orcid: 0000-0002-0996-2260

According to our database¹, Xuechao Wei authored at least 25 papers between 2012 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Klotski v2: Improved DNN Model Orchestration Framework for Dataflow Architecture Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2025

2024

POSTER: RadiK: Scalable Radix Top-K Selection on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection.

[BibT_eX]

[DOI]

Proceedings of the 38th ACM International Conference on Supercomputing, 2024

PT-Map: Efficient Program Transformation Optimization for CGRA Mapping.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2023

An <i>Intermediate-Centric</i> Dataflow for Transposed Convolution Acceleration on FPGA.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., November, 2023

Efficient Super-Resolution System With Block-Wise Hybridization and Quantized Winograd on FPGA.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023

ArchExplorer: Microarchitecture Exploration Via Bottleneck Analysis.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Klotski: DNN Model Orchestration Framework for Dataflow Architecture Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

2022

PetS: A Unified Framework for Parameter-Efficient Transformers Serving.

[BibT_eX]

[DOI]

Proceedings of the 2022 USENIX Annual Technical Conference, 2022

2022 ICCAD CAD Contest Problem C: Microarchitecture Design Space Exploration.

[BibT_eX]

[DOI]

Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with near-Memory Processing.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing.

[BibT_eX]

[DOI]

CoRR, 2021

2020

Generating Systolic Array Accelerators With Reusable Blocks.

[BibT_eX]

[DOI]

IEEE Micro, 2020

FTDL: An FPGA-tailored Architecture for Deep Learning Systems.

[BibT_eX]

[DOI]

Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

FTDL: A Tailored FPGA-Overlay for Deep Learning with High Scalability.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019

Frequency Improvement of Systolic Array-Based CNNs on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management.

[BibT_eX]

[DOI]

Xuechao Wei

Yun Liang

Jason Cong

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018

TGPA: tile-grained pipeline architecture for low latency CNN inference.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2018

2017

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Design Automation Conference, 2017

Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2012

FlexBFS: a parallelism-aware implementation of breadth-first search on GPU.

[BibT_eX]

[DOI]

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Distributed replay protocol for distributed uniprocessors.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2012

Distributed Control Independence for Composable Multi-processors.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE/ACIS 11th International Conference on Computer and Information Science, Shanghai, China, May 30, 2012

Xuechao Wei

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...