Aamer Jaleel

Orcid: 0000-0002-5709-2992

According to our database1, Aamer Jaleel authored at least 69 papers between 2001 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023
Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing.
ACM Trans. Comput. Syst., 2023

cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications.
Proc. ACM Program. Lang., 2023

IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Community-based Matrix Reordering for Sparse Linear Algebra Optimization.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Implicit Memory Tagging: No-Overhead Memory Safety Using Alias-Free Tagged ECC.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling (Extended Abstract).
Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2021
P-OPT: Practical Optimal Cache Replacement for Graph Analytics.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019
DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems.
ACM Trans. Archit. Code Optim., 2019

ExTensor: An Accelerator for Sparse Tensor Algebra.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Adaptive memory-side last-level GPU caching.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

2018
Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2017
Top Picks from the 2016 Computer Architecture Conferences.
IEEE Micro, 2017

Mind The Power Holes: Sifting Operating Points in Power-Limited Heterogeneous Multicores.
IEEE Comput. Archit. Lett., 2017

Beyond the socket: NUMA-aware GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

BATMAN: techniques for maximizing system bandwidth of memory systems with stacked-DRAM.
Proceedings of the International Symposium on Memory Systems, 2017

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks.
Proceedings of the 54th Annual Design Automation Conference, 2017

Using Personality Metrics to Improve Cache Interference Management in Multicore Processors.
Proceedings of the Computing Frontiers Conference, 2017

2016
Maximizing Heterogeneous Processor Performance Under Power Constraints.
ACM Trans. Archit. Code Optim., 2016

The Bunker Cache for spatio-value approximation.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

CANDY: Enabling coherent DRAM caches for multi-node systems.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

DReAM: Dynamic Re-arrangement of Address Mapping to Improve the Performance of DRAMs.
Proceedings of the Second International Symposium on Memory Systems, 2016

HAPPY: Hybrid Address-based Page Policy in DRAMs.
Proceedings of the Second International Symposium on Memory Systems, 2016

LAP: Loop-Block Aware Inclusion Properties for Energy-Efficient Asymmetric Last Level Caches.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

A high-resolution side-channel attack on last-level cache.
Proceedings of the 53rd Annual Design Automation Conference, 2016

2015
Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures.
ACM Trans. Comput. Syst., 2015

Wavelet-Based Trace Alignment Algorithms for Heterogeneous Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

BEAR: techniques for mitigating bandwidth bloat in gigascale DRAM caches.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014
Efficient Spatial Processing Element Control via Triggered Instructions.
IEEE Micro, 2014

CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Automatic SMT threading for OpenMP applications on the Intel Xeon Phi co-processor.
Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers, 2014

Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Undersubscribed threading on clustered cache architectures.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

2013
Using in-flight chains to build a scalable cache coherence protocol.
ACM Trans. Archit. Code Optim., 2013

Triggered instructions: a control paradigm for spatially-programmed architectures.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Trace alignment algorithms for offline workload analysis of heterogeneous architectures.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Fairness-aware scheduling on single-ISA heterogeneous multi-cores.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
The gradient-based cache partitioning algorithm.
ACM Trans. Archit. Code Optim., 2012

Non-monopolizable caches: Low-complexity mitigation of cache side channel attacks.
ACM Trans. Archit. Code Optim., 2012

CoLT: Coalesced Large-Reach TLBs.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Scheduling heterogeneous multi-cores through performance impact estimation (PIE).
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

CRUISE: cache replacement and utility-aware scheduling.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011
PACMan: prefetch-aware cache management for high performance caching.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

SHiP: signature-based hit predictor for high performance caching.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

2010
Analyzing Parallel Programs with Pin.
Computer, 2010

Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

High performance cache replacement using re-reference interval prediction (RRIP).
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Explaining cache SER anomaly using DUE AVF measurement.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

2009
Understanding the Memory Behavior of Emerging Multi-core Workloads.
Proceedings of the Eighth International Symposium on Parallel and Distributed Computing, 2009

CMPSched$im: Evaluating OS/CMP interaction on shared cache management.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

2008
Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching.
IEEE Micro, 2008

Data Sharing Analysis of Emerging Parallel Media Mining Workloads.
Proceedings of the High Performance Computing, 2008

Adaptive insertion policies for managing shared caches.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
Cross Binary Simulation Points.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Understanding the Memory Performance of Data-Mining Workloads on Small, Medium, and Large-Scale CMPs Using Hardware-Software Co-simulation.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Adaptive insertion policies for high performance caching.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

2006
In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs).
IEEE Trans. Computers, 2006

Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

2005
DRAMsim: a memory system simulator.
SIGARCH Comput. Archit. News, 2005

BioBench: A Benchmark Suite of Bioinformatics Applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Using Virtual Load/Store Queues (VLSQs) to Reduce the Negative Effects of Reordered Memory Instructions.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2001
In-Line Interrupt Handling for Software-Managed TLBs.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Improving the Precise Interrupt Mechanism of Software-Managed TLB Miss Handlers.
Proceedings of the High Performance Computing - HiPC 2001, 8th International Conference, 2001


  Loading...