Kwanhee Kyung
Orcid: 0000-0003-4243-2111
According to our database1,
Kwanhee Kyung
authored at least 8 papers
between 2023 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts.
CoRR, July, 2025
SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency.
IEEE Comput. Archit. Lett., 2025
Proceedings of the Twentieth European Conference on Computer Systems, 2025
2024
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024
AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model Inference.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
IEEE Comput. Archit. Lett., 2023