Haikuo Shao

Orcid: 0009-0008-6965-3436

According to our database1, Haikuo Shao authored at least 12 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
ASTRA: Reconfigurable Training Architecture Design for Nonlinear Softmax and Activation Functions in Transformers.
IEEE Trans. Very Large Scale Integr. Syst., July, 2025

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization.
CoRR, May, 2025

AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design.
CoRR, May, 2025

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer.
IEEE Trans. Circuits Syst. I Regul. Pap., March, 2025

An Efficient Training Architecture for Nonlinear Softmax Function in Transformers.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2025

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores.
Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

2024
A Low Complexity Online Learning Approximate Message Passing Detector for Massive MIMO.
IEEE Trans. Very Large Scale Integr. Syst., July, 2024

An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2024

A Flexible FPGA-Based Accelerator for Efficient Inference of Multi-Precision CNNs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2024

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024

2023
An Efficient Training Accelerator for Transformers With Hardware-Algorithm Co-Optimization.
IEEE Trans. Very Large Scale Integr. Syst., November, 2023

2021
An FPGA-Based Reconfigurable Accelerator for Low-Bit DNN Training.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2021


  Loading...