David Black-Schaffer

ACM Trans. Archit. Code Optim., 2021

Early Address Prediction: Efficient Pipeline Prefetch and Reuse.

[BibT_eX]

[DOI]

Ricardo Alves

ACM Trans. Archit. Code Optim., 2021

2020

Page Tables: Keeping them Flat and Hot (Cached).

[BibT_eX]

[DOI]

CoRR, 2020

Architecturally-Independent and Time-Based Characterization of SPEC CPU 2017.

[BibT_eX]

[DOI]

Muhammad Hassan

Chang Hyun Park

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Perforated Page: Supporting Fragmented Memory Allocation for Large Pages.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Modeling and optimizing NUMA effects and prefetching with machine learning.

[BibT_eX]

[DOI]

Isaac Sánchez Barrera

Marc Casas

Miquel Moretó

Anastasiia Stupnikova

Mihail Popov

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019

Maximizing Limited Resources: a Limit-Based Study and Taxonomy of Out-of-Order Commit.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2019

Filter caching for free: the untapped potential of the store-buffer.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

Efficient thread/page/parallelism autotuning for NUMA systems.

[BibT_eX]

[DOI]

Mihail Popov

Alexandra Jimborean

Proceedings of the ACM International Conference on Supercomputing, 2019

Freeway: Maximizing MLP for Slice-Out-of-Order Execution.

[BibT_eX]

[DOI]

Rakesh Kumar

Mehdi Alipour

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

2018

Analyzing performance variation of task schedulers with TaskInsight.

[BibT_eX]

[DOI]

Parallel Comput., 2018

Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-Based GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

Tail-PASS: Resource-Based Cache Management for Tiled Graphics Rendering Hardware.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

Dynamically Disabling Way-prediction to Reduce Instruction Replay.

[BibT_eX]

[DOI]

Ricardo Alves

Proceedings of the 36th IEEE International Conference on Computer Design, 2018

2017

Exploring Scheduling Effects on Task Performance with TaskInsight.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2017

Addressing Energy Challenges in Filter Caches.

[BibT_eX]

[DOI]

Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Adaptive Cache Warming for Faster Simulations.

[BibT_eX]

[DOI]

Gustaf Borgström

Proceedings of the 9th Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2017

TaskInsight: Understanding Task Schedules Effects on Memory and Performance.

[BibT_eX]

[DOI]

Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017

Understanding the interplay between task scheduling, memory and performance.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, 2017

A graphics tracing framework for exploring CPU+GPU memory systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Analyzing graphics workloads on tile-based GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

A Split Cache Hierarchy for Enabling Data-Oriented Optimizations.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

POSTER: Putting the G back into GPU/CPU Systems Research.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Analytical Processor Performance and Power Modeling Using Micro-Architecture Independent Characteristics.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2016

Partitioning GPUs for Improved Scalability.

[BibT_eX]

[DOI]

Johan Janzen

Andra Hugo

Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Formalizing Data Locality in Task Parallel Applications.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

2015

Long term parking (LTP): criticality-aware resource allocation in OOO processors.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Micro-architecture independent analytical processor performance and power modeling.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

StatTask: reuse distance analysis for task-based applications.

[BibT_eX]

[DOI]

Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2015

AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance.

[BibT_eX]

[DOI]

Muneeb Khan

Michael A. Laurenzano

Jason Mars

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Navigating the cache hierarchy with a single lookup.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling.

[BibT_eX]

[DOI]

Alexandra Jimborean

Konstantinos Koukos

Vasileios Spiliopoulos

Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013

TLC: a tag-less cache for reducing dynamic first level cache energy.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Towards more efficient execution: a decoupled access-execute approach.

[BibT_eX]

[DOI]

Konstantinos Koukos

Vasileios Spiliopoulos

Proceedings of the International Conference on Supercomputing, 2013

Modeling performance variation due to cache sharing.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Bandwidth Bandit: Quantitative characterization of memory contention.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012

Bandwidth bandit: Understanding memory contention.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Phase behavior in serial and parallel applications.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Phase guided profiling for fast cache modeling.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Efficient techniques for predicting cache sharing and throughput.

[BibT_eX]

[DOI]

Andreas Sandberg

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Using Hardware Transactional Memory for High-Performance Computing.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Cache Pirating: Measuring the Curse of the Shared Cache.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

Fast modeling of shared caches in multicore systems.

[BibT_eX]

[DOI]

David Eklov

Proceedings of the High Performance Embedded Architectures and Compilers, 2011

2010

Block-Parallel Programming for Real-Time Embedded Applications.

[BibT_eX]

[DOI]

William J. Dally

Proceedings of the 39th International Conference on Parallel Processing, 2010

StatCC: a statistical cache contention model.

[BibT_eX]

[DOI]

David Eklov