Gunho Park

Orcid: 0000-0002-8078-4356

According to our database1, Gunho Park authored at least 14 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs.
CoRR, October, 2025

Faster Inference of LLMs using FP8 on the Intel Gaudi.
CoRR, March, 2025

An Investigation of FP8 Across Accelerators for LLM Inference.
CoRR, February, 2025

FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024
Domain knowledge free cloud-IDS with lightweight embedding method.
J. Cloud Comput., December, 2024

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization.
CoRR, 2024

Low-Power Encoder and Compressor Design for Approximate Radix-8 Booth Multiplier.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2024

LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Simplified Compressor and Encoder Designs for Low-Cost Approximate Radix-4 Booth Multiplier.
IEEE Trans. Circuits Syst. II Express Briefs, March, 2023

Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Energy-Efficient RISC-V-Based Vector Processor for Cache-Aware Structurally-Pruned Transformers.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2023

TF-MVP: Novel Sparsity-Aware Transformer Accelerator with Mixed-Length Vector Pruning.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models.
CoRR, 2022

2021
Design and Analysis of Approximate Compressors for Balanced Error Accumulation in MAC Operator.
IEEE Trans. Circuits Syst. I Regul. Pap., 2021


  Loading...