Minsoo Rhu

IEEE Comput. Archit. Lett., 2024

Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

GPU-based Private Information Retrieval for On-Device Machine Learning Inference.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training.

[BibT_eX]

[DOI]

CoRR, 2023

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.

[BibT_eX]

[DOI]

CoRR, 2023

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations.

[BibT_eX]

[DOI]

John Kim

CoRR, 2023

HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2023

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022

DiVa: An Accelerator for Differentially Private Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

ARK: Fully Homomorphic Encryption Accelerator with Runtime Data Generation and Inter-Operation Key Reuse.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

SmartSAGE: training large-scale graph neural networks using in-storage processing architectures.

[BibT_eX]

[DOI]

Jinha Chung

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Training personalized recommendation systems from (GPU) scratch: look forward not backwards.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

BTS: an accelerator for bootstrappable fully homomorphic encryption.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

PARIS and ELSA: an elastic scheduling algorithm for reconfigurable multi-GPU inference servers.

[BibT_eX]

[DOI]

Yunseong Kim

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021

Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2021

TRiM: Tensor Reduction in Memory.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2021

Characterization and Analysis of Deep Learning for 3D Point Cloud Analytics.

[BibT_eX]

[DOI]

Bongjoon Hyun

Jiwon Lee

IEEE Comput. Archit. Lett., 2021

TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Lazy Batching: An SLA-aware Batching System for Cloud Machine Learning Inference.

[BibT_eX]

[DOI]

Yunseong Kim

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Trident: A Hybrid Correlation-Collision GPU Cache Timing Attack for AES Key Recovery.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020

LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference.

[BibT_eX]

[DOI]

Yunseong Kim

CoRR, 2020

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

Bandwidth Bottleneck in Network-on-Chip for High-Throughput Processors.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

A Disaggregated Memory System for Deep Learning.

[BibT_eX]

[DOI]

IEEE Micro, 2019

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

2018

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training.

[BibT_eX]

[DOI]

CoRR, 2018

A Case for Memory-Centric HPC System Architecture for Training Deep Neural Networks.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2018

Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Accelerator-centric deep learning systems for enhanced scalability, energy-efficiency, and programmability.

[BibT_eX]

[DOI]

Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018

2017

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2017

GPUpd: a fast and scalable multi-GPU architecture using cooperative projection and distribution.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks.

[BibT_eX]

[DOI]

Rangharajan Venkatesan

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Architecting an Energy-Efficient DRAM System for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016

Virtualizing Deep Neural Networks for Memory-Efficient Neural Network Design.

[BibT_eX]

[DOI]

CoRR, 2016

vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

2015

CLEAN-ECC: high reliability ECC for adaptive granularity memory system.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Priority-based cache allocation in throughput processors.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014

GPUVolt: modeling and characterizing voltage noise in GPU architectures.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design, 2014

2013

A locality-aware memory hierarchy for energy-efficient GPU architectures.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation.

[BibT_eX]

[DOI]

Mattan Erez

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

The dual-path execution model for efficient GPU control flow.

[BibT_eX]

[DOI]

Mattan Erez

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012

CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures.

[BibT_eX]

[DOI]

Mattan Erez

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

2010

Optimization of Arithmetic Coding for JPEG2000.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2010

2009

A novel trace-pipelined binary arithmetic coder architecture for JPEG2000.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Signal Processing Systems, 2009

Memory-less bit-plane coder architecture for JPEG2000 with concurrent column-stripe coding.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Image Processing, 2009

Architecture design of a high-performance dual-symbol binary arithmetic coder for JPEG2000.

[BibT_eX]

[DOI]