Aamer Jaleel

CoRR, February, 2025

Teaching an Old Dog New Tricks: Verifiable FHE Using Commodity Hardware.

[BibT_eX]

[DOI]

Proc. Priv. Enhancing Technol., 2025

QPRAC: Towards Secure and Practical PRAC-based Rowhammer Mitigation using Priority Queues.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

OASIS: Object-Aware Page Management for Multi-GPU Systems.

[BibT_eX]

[DOI]

Yueqi Wang

Bingyao Li

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024

Improving Multi-Instance GPU Efficiency via Sub-Entry Sharing TLB Design.

[BibT_eX]

[DOI]

CoRR, 2024

Probabilistic Tracker Management Policies for Low-Cost and Scalable Rowhammer Mitigation.

[BibT_eX]

[DOI]

Stephen W. Keckler

Gururaj Saileshwar

CoRR, 2024

ImPress: Securing DRAM Against Data-Disturbance Errors via Implicit Row-Press Mitigation.

[BibT_eX]

[DOI]

Anish Saxena

Moinuddin Qureshi

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

MINT: Securely Mitigating Rowhammer with a Minimalist in-DRAM Tracker.

[BibT_eX]

[DOI]

Moinuddin Qureshi

Salman Qazi

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

PrIDE: Achieving Secure Rowhammer Mitigation with Low-Cost In-DRAM Trackers.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023

Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2023

cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications.

[BibT_eX]

[DOI]

Proc. ACM Program. Lang., 2023

AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUs.

[BibT_eX]

[DOI]

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Community-based Matrix Reordering for Sparse Linear Algebra Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Implicit Memory Tagging: No-Overhead Memory Safety Using Alias-Free Tagged ECC.

[BibT_eX]

[DOI]

Michael B. Sullivan

Stephen W. Keckler

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling (Extended Abstract).

[BibT_eX]

[DOI]

Toluwanimi O. Odemuyiwa

Hadi Asghari Moghaddam

Christopher W. Fletcher

Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling.

[BibT_eX]

[DOI]

Toluwanimi O. Odemuyiwa

Hadi Asghari Moghaddam

Christopher W. Fletcher

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2021

P-OPT: Practical Optimal Cache Replacement for Graph Analytics.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020

HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019

DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems.

[BibT_eX]

[DOI]

Eiman Ebrahimi

Sam Duncan

ACM Trans. Archit. Code Optim., 2019

ExTensor: An Accelerator for Sparse Tensor Algebra.

[BibT_eX]

[DOI]

Kartik Hegde

Hadi Asghari Moghaddam

Christopher W. Fletcher

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Adaptive memory-side last-level GPU caching.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

2018

Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2017

Top Picks from the 2016 Computer Architecture Conferences.

[BibT_eX]

[DOI]

IEEE Micro, 2017

Mind The Power Holes: Sifting Operating Points in Power-Limited Heterogeneous Multicores.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2017

Beyond the socket: NUMA-aware GPUs.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

BATMAN: techniques for maximizing system bandwidth of memory systems with stacked-DRAM.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Memory Systems, 2017

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks.

[BibT_eX]

[DOI]

Mehmet Kayaalp

Khaled N. Khasawneh

Hodjat Asghari Esfeden

Proceedings of the 54th Annual Design Automation Conference, 2017

Using Personality Metrics to Improve Cache Interference Management in Multicore Processors.

[BibT_eX]

[DOI]

Mwaffaq Otoom

Natalie D. Enright Jerger

Pedro Trancoso

Proceedings of the Computing Frontiers Conference, 2017

2016

Maximizing Heterogeneous Processor Performance Under Power Constraints.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

The Bunker Cache for spatio-value approximation.

[BibT_eX]

[DOI]

Joshua San Miguel

Jorge Albericio

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

CANDY: Enabling coherent DRAM caches for multi-node systems.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

DReAM: Dynamic Re-arrangement of Address Mapping to Improve the Performance of DRAMs.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on Memory Systems, 2016

HAPPY: Hybrid Address-based Page Policy in DRAMs.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on Memory Systems, 2016

LAP: Loop-Block Aware Inclusion Properties for Energy-Efficient Asymmetric Last Level Caches.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

A high-resolution side-channel attack on last-level cache.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual Design Automation Conference, 2016

2015

Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2015

Wavelet-Based Trace Alignment Algorithms for Heterogeneous Architectures.

[BibT_eX]

[DOI]

Muhammet Mustafa Ozdal

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

BEAR: techniques for mitigating bandwidth bloat in gigascale DRAM caches.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014

Efficient Spatial Processing Element Control via Triggered Instructions.

[BibT_eX]

[DOI]

IEEE Micro, 2014

CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Automatic SMT threading for OpenMP applications on the Intel Xeon Phi co-processor.

[BibT_eX]

[DOI]

Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers, 2014

Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers.

[BibT_eX]

[DOI]

Rajeev Balasubramonian

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Undersubscribed threading on clustered cache architectures.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

2013

Using in-flight chains to build a scalable cache coherence protocol.

[BibT_eX]

[DOI]

Samantika Subramaniam

ACM Trans. Archit. Code Optim., 2013

Triggered instructions: a control paradigm for spatially-programmed architectures.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Trace alignment algorithms for offline workload analysis of heterogeneous architectures.

[BibT_eX]

[DOI]

Muhammet Mustafa Ozdal

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Fairness-aware scheduling on single-ISA heterogeneous multi-cores.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

The gradient-based cache partitioning algorithm.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2012

Non-monopolizable caches: Low-complexity mitigation of cache side channel attacks.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2012

CoLT: Coalesced Large-Reach TLBs.

[BibT_eX]

[DOI]

Binh Pham

Viswanathan Vaidyanathan

Abhishek Bhattacharjee

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Scheduling heterogeneous multi-cores through performance impact estimation (PIE).

[BibT_eX]

[DOI]

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

CRUISE: cache replacement and utility-aware scheduling.

[BibT_eX]

[DOI]

Hashem Hashemi Najaf-abadi

Samantika Subramaniam

Simon C. Steely Jr.

Joel S. Emer

Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011

PACMan: prefetch-aware cache management for high performance caching.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

SHiP: signature-based hit predictor for high performance caching.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

2010

Analyzing Parallel Programs with Pin.

[BibT_eX]

[DOI]

Computer, 2010

Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies.

[BibT_eX]

[DOI]

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

High performance cache replacement using re-reference interval prediction (RRIP).

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Explaining cache SER anomaly using DUE AVF measurement.

[BibT_eX]

[DOI]

Arijit Biswas

Charles Recchia

Shubhendu S. Mukherjee

Vinod Ambrose

Leo Chan

Athanasios E. Papathanasiou

Mike Plaster

Norbert Seifert

Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

2009

Understanding the Memory Behavior of Emerging Multi-core Workloads.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Symposium on Parallel and Distributed Computing, 2009

CMPSched$im: Evaluating OS/CMP interaction on shared cache management.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

2008

Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching.

[BibT_eX]

[DOI]

IEEE Micro, 2008

Data Sharing Analysis of Emerging Parallel Media Mining Workloads.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2008

Adaptive insertion policies for managing shared caches.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

Cross Binary Simulation Points.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Understanding the Memory Performance of Data-Mining Workloads on Small, Medium, and Large-Scale CMPs Using Hardware-Software Co-simulation.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Adaptive insertion policies for high performance caching.

[BibT_eX]

[DOI]

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling.

[BibT_eX]

[DOI]

Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

2006

The effects of Aggressive out-of-order Mechanisms on the Memory sub-System.

[BibT_eX]

[DOI]

PhD thesis, 2006

In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs).

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2006

Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads.

[BibT_eX]

[DOI]

Matthew Mattina

Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

2005

DRAMsim: a memory system simulator.

[BibT_eX]

[DOI]

David Wang

Brinda Ganesh

Nuengwong Tuaycharoen

Kathleen Baynes

SIGARCH Comput. Archit. News, 2005

BioBench: A Benchmark Suite of Bioinformatics Applications.

[BibT_eX]

[DOI]

Kursad Albayraktaroglu

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Using Virtual Load/Store Queues (VLSQs) to Reduce the Negative Effects of Reordered Memory Instructions.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2001

In-Line Interrupt Handling for Software-Managed TLBs.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Improving the Precise Interrupt Mechanism of Software-Managed TLB Miss Handlers.

[BibT_eX]

[DOI]