Minsoo Rhu

Orcid: 0000-0003-3303-8681

According to our database1, Minsoo Rhu authored at least 51 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems.
IEEE Comput. Archit. Lett., 2024

Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

GPU-based Private Information Retrieval for On-Device Machine Learning Inference.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training.
CoRR, 2023

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.
CoRR, 2023

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations.
CoRR, 2023

HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization.
IEEE Comput. Archit. Lett., 2023

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022
DiVa: An Accelerator for Differentially Private Machine Learning.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

ARK: Fully Homomorphic Encryption Accelerator with Runtime Data Generation and Inter-Operation Key Reuse.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

SmartSAGE: training large-scale graph neural networks using in-storage processing architectures.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Training personalized recommendation systems from (GPU) scratch: look forward not backwards.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

BTS: an accelerator for bootstrappable fully homomorphic encryption.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

PARIS and ELSA: an elastic scheduling algorithm for reconfigurable multi-GPU inference servers.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training.
IEEE Comput. Archit. Lett., 2021

TRiM: Tensor Reduction in Memory.
IEEE Comput. Archit. Lett., 2021

Characterization and Analysis of Deep Learning for 3D Point Cloud Analytics.
IEEE Comput. Archit. Lett., 2021

TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Lazy Batching: An SLA-aware Batching System for Cloud Machine Learning Inference.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Trident: A Hybrid Correlation-Collision GPU Cache Timing Attack for AES Key Recovery.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference.
CoRR, 2020

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

Bandwidth Bottleneck in Network-on-Chip for High-Throughput Processors.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
A Disaggregated Memory System for Deep Learning.
IEEE Micro, 2019

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

2018
Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training.
CoRR, 2018

A Case for Memory-Centric HPC System Architecture for Training Deep Neural Networks.
IEEE Comput. Archit. Lett., 2018

Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep Learning.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Accelerator-centric deep learning systems for enhanced scalability, energy-efficiency, and programmability.
Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018

2017
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks.
CoRR, 2017

GPUpd: a fast and scalable multi-GPU architecture using cooperative projection and distribution.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Architecting an Energy-Efficient DRAM System for GPUs.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016
Virtualizing Deep Neural Networks for Memory-Efficient Neural Network Design.
CoRR, 2016

vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

2015
CLEAN-ECC: high reliability ECC for adaptive granularity memory system.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Priority-based cache allocation in throughput processors.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014
GPUVolt: modeling and characterizing voltage noise in GPU architectures.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014

2013
A locality-aware memory hierarchy for energy-efficient GPU architectures.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

The dual-path execution model for efficient GPU control flow.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012
CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

2010
Optimization of Arithmetic Coding for JPEG2000.
IEEE Trans. Circuits Syst. Video Technol., 2010

2009
A novel trace-pipelined binary arithmetic coder architecture for JPEG2000.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2009

Memory-less bit-plane coder architecture for JPEG2000 with concurrent column-stripe coding.
Proceedings of the International Conference on Image Processing, 2009

Architecture design of a high-performance dual-symbol binary arithmetic coder for JPEG2000.
Proceedings of the International Conference on Image Processing, 2009


  Loading...