Haikuo Shao

Orcid: 0009-0008-6965-3436

According to our database1, Haikuo Shao authored at least 14 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
AccLLM: Accelerating Long-Context LLM Inference via Algorithm-Hardware Co-Design.
IEEE Trans. Very Large Scale Integr. Syst., April, 2026

APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM Acceleration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2026

2025
An Efficient Layer Normalization Training Module With Dynamic Quantization for Transformers.
IEEE Trans. Circuits Syst. II Express Briefs, September, 2025

ASTRA: Reconfigurable Training Architecture Design for Nonlinear Softmax and Activation Functions in Transformers.
IEEE Trans. Very Large Scale Integr. Syst., July, 2025

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer.
IEEE Trans. Circuits Syst. I Regul. Pap., March, 2025

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2025

An Efficient Training Architecture for Nonlinear Softmax Function in Transformers.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2025

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores.
Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

2024
A Low Complexity Online Learning Approximate Message Passing Detector for Massive MIMO.
IEEE Trans. Very Large Scale Integr. Syst., July, 2024

An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2024

A Flexible FPGA-Based Accelerator for Efficient Inference of Multi-Precision CNNs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2024

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024

2023
An Efficient Training Accelerator for Transformers With Hardware-Algorithm Co-Optimization.
IEEE Trans. Very Large Scale Integr. Syst., November, 2023

2021
An FPGA-Based Reconfigurable Accelerator for Low-Bit DNN Training.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2021


  Loading...