Kwanhee Kyung

Orcid: 0000-0003-4243-2111

According to our database1, Kwanhee Kyung authored at least 8 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts.
CoRR, July, 2025

SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency.
IEEE Comput. Archit. Lett., 2025

PET: Proactive Demotion for Efficient Tiered Memory Management.
Proceedings of the Twentieth European Conference on Computer Systems, 2025

2024
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model Inference.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
ADT: Aggressive Demotion and Promotion for Tiered Memory.
IEEE Comput. Archit. Lett., 2023

Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models.
IEEE Comput. Archit. Lett., 2023


  Loading...