Jung Ho Ahn

CoRR, 2023

HyPHEN: A Hybrid Packing Method and Optimizations for Homomorphic Encryption-Based Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2023

X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2023

ADT: Aggressive Demotion and Promotion for Tiered Memory.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2023

A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2023

Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2023

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

How to Kill the Second Bird with One ECC: The Pursuit of Row Hammer Resilient DRAM.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

SHARP: A Short-Word Hierarchical Accelerator for Robust and Practical Fully Homomorphic Encryption.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

SHADOW: Preventing Row Hammer in DRAM with Intra-Subarray Row Shuffling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022

MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2022

Future Scaling of Memory Hierarchy for Tensor Cores and Eliminating Redundant Shared Memory Traffic Using Inter-Warp Multicasting.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2022

AESPA: Accuracy Preserving Low-degree Polynomial Activation for Fast Private Inference.

[BibT_eX]

[DOI]

CoRR, 2022

GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2022

ARK: Fully Homomorphic Encryption Accelerator with Runtime Data Generation and Inter-Operation Key Reuse.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

BTS: an accelerator for bootstrappable fully homomorphic encryption.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

A Slice and Dice Approach to Accelerate Compound Sparse Attention on GPU.

[BibT_eX]

[DOI]

Hailong Li

Jaewan Choi

Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Accelerating Transformer Networks through Recomposing Softmax Layers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Mithril: Cooperative Row Hammer Protection on Commodity DRAM Leveraging Managed Refresh.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021

Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs.

[BibT_eX]

[DOI]

IACR Trans. Cryptogr. Hardw. Embed. Syst., 2021

TRiM: Tensor Reduction in Memory.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2021

Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array+ Structure.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2021

Accelerating Fully Homomorphic Encryption Through Architecture-Centric Analysis and Optimization.

[BibT_eX]

[DOI]

IEEE Access, 2021

TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

MaPHeA: a lightweight memory hierarchy-aware profile-guided heap allocation framework.

[BibT_eX]

[DOI]

Proceedings of the LCTES '21: 22nd ACM SIGPLAN/SIGBED International Conference on Languages, 2021

Accelerating Fully Homomorphic Encryption Through Microarchitecture-Aware Analysis and Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

BCD deduplication: effective memory compression using partial cache-line deduplication.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020

MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2020

HEAAN Demystified: Accelerating Fully Homomorphic Encryption Through Architecture-centric Analysis and Optimization.

[BibT_eX]

[DOI]

CoRR, 2020

CAT-TWO: Counter-Based Adaptive Tree, Time Window Optimized for DRAM Row-Hammer Prevention.

[BibT_eX]

[DOI]

Ingab Kang

Eojin Lee

IEEE Access, 2020

Graphene: Strong yet Lightweight Row Hammer Protection.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

2019

Restructuring Batch Normalization to Accelerate CNN Training.

[BibT_eX]

[DOI]

Proceedings of Machine Learning and Systems 2019, 2019

TWiCe: preventing row-hammering by exploiting time window counters.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

Enforcing Last-Level Cache Partitioning through Memory Virtual Channels.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

TWiCe: Time Window Counter Based Row Refresh to Prevent Row-Hammering.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2018

Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2018

Leveraging Power-Performance Relationship of Energy-Efficient Modern DRAM Devices.

[BibT_eX]

[DOI]

IEEE Access, 2018

Memory Hierarchy for Web Search.

[BibT_eX]

[DOI]

Grant Ayers

Parthasarathy Ranganathan

Christos Kozyrakis

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

3D-Xpath: high-density managed DRAM architecture with cost-effective alternative paths for memory transactions.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2017

Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2017

Evaluation of Performance Unfairness in NUMA System Architecture.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2017

SALAD: Achieving Symmetric Access Latency with Asymmetric DRAM Architecture.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2017

Understanding power-performance relationship of energy-efficient modern DRAM devices.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Work as a team or individual: Characterizing the system-level impacts of main memory partitioning.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

SOUP-N-SALAD: Allocation-Oblivious Access Latency Reduction with Asymmetric DRAM Microarchitectures.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Defect Analysis and Cost-Effective Resilience Architecture for Future DRAM Devices.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016

Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2016

Near-DRAM Acceleration with Single-ISA Heterogeneous Processing in Standard Memory Modules.

[BibT_eX]

[DOI]

Hadi Asghari Moghaddam

Amin Farmahini Farahani

Katherine Morrow

IEEE Micro, 2016

Achieving One Billion Key-Value Requests per Second on a Single Server.

[BibT_eX]

[DOI]

IEEE Micro, 2016

Exploring new features of high-bandwidth memory for GPUs.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2016

Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications.

[BibT_eX]

[DOI]

Daejin Jung

Sheng Li

IEEE Comput. Archit. Lett., 2016

Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems.

[BibT_eX]

[DOI]

Hadi Asghari Moghaddam

Young Hoon Son

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Adaptive and flexible key-value stores through soft data partitioning.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Buffered compares: Excavating the hidden parallelism inside DRAM architectures with lightweight logic.

[BibT_eX]

[DOI]

Jinho Lee

Kiyoung Choi

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Accelerating Linked-list Traversal Through Near-Data Processing.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

CIDR: A Cache Inspired Area-Efficient DRAM Resilience Architecture against Permanent Faults.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2015

DRAMA: An Architecture for Accelerated Processing Near Memory.

[BibT_eX]

[DOI]

Amin Farmahini Farahani

Katherine Morrow

IEEE Comput. Archit. Lett., 2015

Architecting to achieve a billion requests per second throughput on a single key-value store server platform.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

History-Assisted Adaptive-Granularity Caches (HAAG$) for High Performance 3D DRAM Architectures.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Alloy: Parallel-serial memory channel architecture for single-chip heterogeneous processor systems.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

CiDRA: A cache-inspired DRAM resilience architecture.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules.

[BibT_eX]

[DOI]

Amin Farmahini Farahani

Katherine Morrow

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014

Microbank: Architecting Through-Silicon Interposer-Based Main Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

Row-buffer decoupling: A case for low-latency DRAM microarchitecture.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

2013

Exploiting Replicated Cache Blocks to Reduce L2 Cache Leakage in CMPs.

[BibT_eX]

[DOI]

Hyunhee Kim

Jihong Kim

IEEE Trans. Very Large Scale Integr. Syst., 2013

MAEPER: Matching Access and Error Patterns With Error-Free Resource for Low Vcc L1 Cache.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2013

Mapping and Scheduling of Tasks and Communications on Many-Core SoC Under Local Memory Constraint.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2013

The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

Scalable high-radix router microarchitecture using a network switch organization.

[BibT_eX]

[DOI]

Young Hoon Son

John Kim

ACM Trans. Archit. Code Optim., 2013

McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Reducing memory access latency with asymmetric DRAM bank organizations.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Memory-centric system interconnect design with Hybrid Memory Cubes.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

Improving System Energy Efficiency with Memory Rank Subsetting.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2012

Optical High Radix Switch Design.

[BibT_eX]

[DOI]

IEEE Micro, 2012

MAGE: adaptive granularity and ECC for resilient and power efficient memory systems.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Network within a network approach to create a scalable high-radix router microarchitecture.

[BibT_eX]

[DOI]

Sungwoo Choo

John Kim

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory.

[BibT_eX]

[DOI]

Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

2011

3D network-on-chip with wireless links through inductive coupling.

[BibT_eX]

[DOI]

Proceedings of the International SoC Design Conference, 2011

The role of optics in future high radix switch design.

[BibT_eX]

[DOI]

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

A quantitative analysis of performance benefits of 3D die stacking on mobile and embedded SoC.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2011

Matching cache access behavior and bit error pattern for high performance low Vcc L1 cache.

[BibT_eX]

[DOI]

Proceedings of the 48th Design Automation Conference, 2011

CMOS Nanophotonics: Technology, System Implications, and a CMP Case Study.

[BibT_eX]

[DOI]

Proceedings of the Low Power Networks-on-Chip., 2011

2010

Replication-aware leakage management in chip multiprocessors with private L2 cache.

[BibT_eX]

[DOI]

Hyunhee Kim

Jihong Kim

Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

2009

How to simulate 1000 cores.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2009

Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2009

Future scaling of processor-memory interfaces.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

HyperX: topology, routing, and packaging of efficient large-scale networks.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures.

[BibT_eX]

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

2008

Corona: System Implications of Emerging Nanophotonic Technology.

[BibT_eX]

[DOI]

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies.

[BibT_eX]

[DOI]

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

A Nanophotonic Interconnect for High-Performance Many-Core Computation.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

2007

Executing irregular scientific applications on stream architectures.

[BibT_eX]

[DOI]

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Tradeoff between data-, instruction-, and thread-level parallelism in stream processors.

[BibT_eX]

[DOI]

Mattan Erez

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

2006

Data parallel address architecture.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2006

Architecture - The design space of data-parallel memory systems.

[BibT_eX]

[DOI]

Mattan Erez

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

2005

Scatter-Add in Data Parallel Architectures.

[BibT_eX]

[DOI]

Mattan Erez