Xuechao Wei

Orcid: 0000-0002-0996-2260

According to our database1, Xuechao Wei authored at least 21 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
POSTER: RadiK: Scalable Radix Top-K Selection on GPUs.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

2023
An <i>Intermediate-Centric</i> Dataflow for Transposed Convolution Acceleration on FPGA.
ACM Trans. Embed. Comput. Syst., November, 2023

Efficient Super-Resolution System With Block-Wise Hybridization and Quantized Winograd on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023

ArchExplorer: Microarchitecture Exploration Via Bottleneck Analysis.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Klotski: DNN Model Orchestration Framework for Dataflow Architecture Accelerators.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

2022
PetS: A Unified Framework for Parameter-Efficient Transformers Serving.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022

2022 ICCAD CAD Contest Problem C: Microarchitecture Design Space Exploration.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with near-Memory Processing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing.
CoRR, 2021

2020
Generating Systolic Array Accelerators With Reusable Blocks.
IEEE Micro, 2020

FTDL: An FPGA-tailored Architecture for Deep Learning Systems.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

FTDL: A Tailored FPGA-Overlay for Deep Learning with High Scalability.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
Frequency Improvement of Systolic Array-Based CNNs on FPGAs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018
TGPA: tile-grained pipeline architecture for low latency CNN inference.
Proceedings of the International Conference on Computer-Aided Design, 2018

2017
Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017

Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2012
FlexBFS: a parallelism-aware implementation of breadth-first search on GPU.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Distributed replay protocol for distributed uniprocessors.
Proceedings of the International Conference on Supercomputing, 2012

Distributed Control Independence for Composable Multi-processors.
Proceedings of the 2012 IEEE/ACIS 11th International Conference on Computer and Information Science, Shanghai, China, May 30, 2012


  Loading...