David Black-Schaffer

Orcid: 0000-0001-5375-4058

Affiliations:
  • Uppsala Univeristy, Department of Information Technology, Sweden
  • University of Stanford, Computer Systems Laboratory, CA, USA


According to our database1, David Black-Schaffer authored at least 60 papers between 2007 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Exploring the Latency Sensitivity of Cache Replacement Policies.
IEEE Comput. Archit. Lett., 2023

Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of Swapping.
Proceedings of the International Symposium on Memory Systems, 2023

Protean: Resource-efficient Instruction Prefetching.
Proceedings of the International Symposium on Memory Systems, 2023

Faster Functional Warming with Cache Merging.
Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems, 2023

2022
Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores.
ACM Trans. Archit. Code Optim., 2022

Freeway to Memory Level Parallelism in Slice-Out-of-Order Cores.
CoRR, 2022

Every walk's a hit: making page walks single-access cache hits.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006.
ACM Trans. Archit. Code Optim., 2021

Early Address Prediction: Efficient Pipeline Prefetch and Reuse.
ACM Trans. Archit. Code Optim., 2021

2020
Page Tables: Keeping them Flat and Hot (Cached).
CoRR, 2020

Architecturally-Independent and Time-Based Characterization of SPEC CPU 2017.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Perforated Page: Supporting Fragmented Memory Allocation for Large Pages.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Modeling and optimizing NUMA effects and prefetching with machine learning.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019
Maximizing Limited Resources: a Limit-Based Study and Taxonomy of Out-of-Order Commit.
J. Signal Process. Syst., 2019

Filter caching for free: the untapped potential of the store-buffer.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Efficient thread/page/parallelism autotuning for NUMA systems.
Proceedings of the ACM International Conference on Supercomputing, 2019

Freeway: Maximizing MLP for Slice-Out-of-Order Execution.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

2018
Analyzing performance variation of task schedulers with TaskInsight.
Parallel Comput., 2018

Behind the Scenes: Memory Analysis of Graphical Workloads on Tile-Based GPUs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

Tail-PASS: Resource-Based Cache Management for Tiled Graphics Rendering Hardware.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

Dynamically Disabling Way-prediction to Reduce Instruction Replay.
Proceedings of the 36th IEEE International Conference on Computer Design, 2018

2017
Exploring Scheduling Effects on Task Performance with TaskInsight.
Supercomput. Front. Innov., 2017

Addressing Energy Challenges in Filter Caches.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Adaptive Cache Warming for Faster Simulations.
Proceedings of the 9th Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2017

TaskInsight: Understanding Task Schedules Effects on Memory and Performance.
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017

Understanding the interplay between task scheduling, memory and performance.
Proceedings of the Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, 2017

A graphics tracing framework for exploring CPU+GPU memory systems.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Analyzing graphics workloads on tile-based GPUs.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

A Split Cache Hierarchy for Enabling Data-Oriented Optimizations.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

POSTER: Putting the G back into GPU/CPU Systems Research.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Analytical Processor Performance and Power Modeling Using Micro-Architecture Independent Characteristics.
IEEE Trans. Computers, 2016

Partitioning GPUs for Improved Scalability.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Formalizing Data Locality in Task Parallel Applications.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

2015
Long term parking (LTP): criticality-aware resource allocation in OOO processors.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Micro-architecture independent analytical processor performance and power modeling.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

StatTask: reuse distance analysis for task-based applications.
Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2015

AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Navigating the cache hierarchy with a single lookup.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013
TLC: a tag-less cache for reducing dynamic first level cache energy.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Towards more efficient execution: a decoupled access-execute approach.
Proceedings of the International Conference on Supercomputing, 2013

Modeling performance variation due to cache sharing.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012
Bandwidth bandit: Understanding memory contention.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Phase behavior in serial and parallel applications.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Phase guided profiling for fast cache modeling.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Efficient techniques for predicting cache sharing and throughput.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Bandwidth bandit: quantitative characterization of memory contention.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Using Hardware Transactional Memory for High-Performance Computing.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Cache Pirating: Measuring the Curse of the Shared Cache.
Proceedings of the International Conference on Parallel Processing, 2011

Fast modeling of shared caches in multicore systems.
Proceedings of the High Performance Embedded Architectures and Compilers, 2011

2010
Block-Parallel Programming for Real-Time Embedded Applications.
Proceedings of the 39th International Conference on Parallel Processing, 2010

StatCC: a statistical cache contention model.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2008
Efficient Embedded Computing.
Computer, 2008

Hierarchical Instruction Register Organization.
IEEE Comput. Archit. Lett., 2008

An Energy-Efficient Processor Architecture for Embedded Systems.
IEEE Comput. Archit. Lett., 2008

2007
Register pointer architecture for efficient embedded processors.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007


  Loading...