Erik Hagersten

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

2017

Exploring Scheduling Effects on Task Performance with TaskInsight.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2017

Understanding the interplay between task scheduling, memory and performance.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, 2017

A graphics tracing framework for exploring CPU+GPU memory systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

A Split Cache Hierarchy for Enabling Data-Oriented Optimizations.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

POSTER: Putting the G back into GPU/CPU Systems Research.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Analytical Processor Performance and Power Modeling Using Micro-Architecture Independent Characteristics.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2016

Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

CoolSim: Statistical techniques to replace cache warming with efficient, virtualized profiling.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

CoolSim: Eliminating traditional cache warming with fast, virtualized profiling.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Message from the general chair.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Formalizing Data Locality in Task Parallel Applications.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

2015

The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2015

Long term parking (LTP): criticality-aware resource allocation in OOO processors.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Micro-architecture independent analytical processor performance and power modeling.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Cost-effective speculative scheduling in high performance processors.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

StatTask: reuse distance analysis for task-based applications.

[BibT_eX]

[DOI]

Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2015

AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance.

[BibT_eX]

[DOI]

Michael A. Laurenzano

Jason Mars

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

An Efficient, Self-Contained, On-chip Directory: DIR1-SISD.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Resource conscious prefetching for irregular applications in multicores.

[BibT_eX]

[DOI]

Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Extending statistical cache models to support detailed pipeline simulators.

[BibT_eX]

[DOI]

Nikos Nikoleris

Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

A case for resource efficient prefetching in multicores.

[BibT_eX]

[DOI]

Andreas Sandberg

Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

A software based profiling method for obtaining speedup stacks on commodity multi-cores.

[BibT_eX]

[DOI]

Nikos Nikoleris

Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

Navigating the cache hierarchy with a single lookup.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

2013

TLC: a tag-less cache for reducing dynamic first level cache energy.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Modeling performance variation due to cache sharing.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Bandwidth Bandit: Quantitative characterization of memory contention.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012

Low Overhead Instruction-Cache Modeling Using Instruction Reuse Profiles.

[BibT_eX]

[DOI]

Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

The HOPSA Workflow and Tools.

[BibT_eX]

[DOI]

Proceedings of the Tools for High Performance Computing 2012, 2012

Bandwidth bandit: Understanding memory contention.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Phase behavior in serial and parallel applications.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Phase guided profiling for fast cache modeling.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Efficient techniques for predicting cache sharing and throughput.

[BibT_eX]

[DOI]

Andreas Sandberg

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Efficient software-based online phase classification.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Cache Pirating: Measuring the Curse of the Shared Cache.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

Fast modeling of shared caches in multicore systems.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2011

2010

Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses.

[BibT_eX]

[DOI]

Andreas Sandberg

Proceedings of the Conference on High Performance Computing Networking, 2010

StatStack: Efficient modeling of LRU caches.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

StatCC: a statistical cache contention model.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

Reconsidering algorithms for iterative solvers in the multicore era.

[BibT_eX]

[DOI]

Int. J. Comput. Sci. Eng., 2009

2008

Improving Cache Utilization Using Acumem VPE.

[BibT_eX]

[DOI]

Mats Nilsson

Magnus Vesterlund

Proceedings of the Tools for High Performance Computing, 2008

2007

A case for low-complexity MP architectures.

[BibT_eX]

[DOI]

Håkan Zeffer

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Conserving Memory Bandwidth in Chip Multiprocessors with Runahead Execution.

[BibT_eX]

[DOI]

Martin Karlsson

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

2006

A statistical multiprocessor cache model.

[BibT_eX]

[DOI]

Håkan Zeffer

Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Exploiting locality: a flexible DSM approach.

[BibT_eX]

[DOI]

Håkan Zeffer

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Modeling Cache Sharing on Chip Multiprocessor Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

TMA: a trap-based memory architecture.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

2005

Fast data-locality profiling of native execution.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2005

VASA: A Simulator Infrastructure with Adjustable Fidelity.

[BibT_eX]

Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2005

Exploring Processor Design Options for Java-Based Middleware.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Skewed caches from a low-power perspective.

[BibT_eX]

[DOI]

Mathias Spjuth

Martin Karlsson

Proceedings of the Second Conference on Computing Frontiers, 2005

2004

StatCache: a probabilistic approach to efficient and accurate data locality analysis.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Bundling: Reducing the Overhead of Multiprocessor Prefetchers.

[BibT_eX]

[DOI]

Dan Wallin

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Exploiting Spatial Store Locality Through Permission Caching in Software DSMs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2004 Parallel Processing, 2004

2003

Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors.

[BibT_eX]

[DOI]

Dan Wallin

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Hierarchical Backoff Locks for Nonuniform Communication Architectures.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Memory System Behavior of Java-Based Middleware.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

THROOM - Supporting POSIX Multithreaded Binaries on a Cluster.

[BibT_eX]

[DOI]

Henrik Löf

Proceedings of the Euro-Par 2003. Parallel Processing, 2003

2002

Efficient synchronization for nonuniform communication architectures.

[BibT_eX]

[DOI]

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

SIP: Performance Tuning through Source Code Interdependence.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2002, 2002

2001

Removing the overhead from software-based shared memory.

[BibT_eX]

[DOI]

Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

2000

Shared-memory multiprocessing: Current state and future directions.

[BibT_eX]

[DOI]

Adv. Comput., 2000

High-Performance Computers: Yesterday, Today, and Tomorrow.

[BibT_eX]

[DOI]

Proceedings of the Applied Parallel Computing, 2000

1999

Parallel computing in the commercial marketplace: research and innovation at work.

[BibT_eX]

[DOI]

Greg Papadopoulos

Proc. IEEE, 1999

WildFire: A Scalable Path for SMPs.

[BibT_eX]

[DOI]

Michael Koster

Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

1997

Trends in Shared Memory Multiprocessing.

[BibT_eX]

[DOI]

Computer, 1997

1994

Queue Locks on Cache Coherent Multiprocessors.

[BibT_eX]

[DOI]

Peter S. Magnusson

Proceedings of the 8th International Symposium on Parallel Processing, 1994

Simple COMA Node Implementations.

[BibT_eX]

[DOI]

Ashley Saulsbury

Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

1993

Simulating the Data Diffusion Machine.

[BibT_eX]

[DOI]

Proceedings of the PARLE '93, 1993

1992

DDM - A Cache-Only Memory Architecture.

[BibT_eX]

[DOI]

Seif Haridi

Computer, 1992

1991

Race-Free Interconnection Networks and Multiprocessor Consistency.

[BibT_eX]

[DOI]

Seif Haridi

Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

1989

The Cache Coherence Protocol of the Data Diffusion Machine.

[BibT_eX]

[DOI]

Seif Haridi