Georg Hager

Dataset, November, 2024

Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa - PMBS@SC24 Artifact.

[BibT_eX]

[DOI]

Jan Laukemann

Dataset, September, 2024

Artifact description: CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion.

[BibT_eX]

[DOI]

Dataset, May, 2024

Energy-aware operation of HPC systems in Germany.

[BibT_eX]

[DOI]

CoRR, 2024

Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa.

[BibT_eX]

[DOI]

Jan Laukemann

Rafael Ravedutti Lucio Machado

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023

MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., December, 2023

Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., November, 2023

LIKWID.

[BibT_eX]

[DOI]

Dataset, November, 2023

LIKWID.

[BibT_eX]

[DOI]

Dataset, November, 2023

LIKWID.

[BibT_eX]

[DOI]

Dataset, November, 2023

Artifact description: CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion.

[BibT_eX]

[DOI]

Dataset, October, 2023

Orthogonal Layers of Parallelism in Large-Scale Eigenvalue Computations.

[BibT_eX]

[DOI]

Andreas Alvermann

ACM Trans. Parallel Comput., September, 2023

SPEChpc 2021 Benchmarks: A Performance and Energy Case Study.

[BibT_eX]

[DOI]

Dataset, September, 2023

Artifact description for PMBS'23 paper: Calculating Primes the Expensive Way: A Case Study in Write-Allocate Evasion on Intel Ice Lake SP.

[BibT_eX]

[DOI]

Dataset, August, 2023

Artifact description for PMBS'23 paper: Calculating Primes the Expensive Way: A Case Study in Write-Allocate Evasion on Intel Ice Lake SP.

[BibT_eX]

[DOI]

Dataset, August, 2023

Artifact Description/Artifact Evaluation/Computational Artifact for "SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study".

[BibT_eX]

[DOI]

Dataset, August, 2023

SPEChpc 2021 Benchmarks: A Performance and Energy Case Study.

[BibT_eX]

[DOI]

Dataset, August, 2023

Analytical performance estimation during code generation on modern GPUs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., March, 2023

Level-Based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., February, 2023

The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs.

[BibT_eX]

[DOI]

Rafael Ravedutti Lucio Machado

IEEE Trans. Parallel Distributed Syst., February, 2023

MD-Bench: Engineering the in-core performance of short-range molecular dynamics kernels from state-of-the-art simulation packages.

[BibT_eX]

[DOI]

CoRR, 2023

Core-Level Performance Engineering with the Open-Source Architecture Code Analyzer (OSACA) and the Compiler Explorer.

[BibT_eX]

[DOI]

Jan Laukemann

Proceedings of the Companion of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023

Application Knowledge Required: Performance Modeling for Fun and Profit.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023

SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study.

[BibT_eX]

[DOI]

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Physical Oscillator Model for Supercomputing.

[BibT_eX]

[DOI]

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2022

LIKWID.

[BibT_eX]

[DOI]

Dataset, December, 2022

LIKWID.

[BibT_eX]

[DOI]

Dataset, December, 2022

LIKWID.

[BibT_eX]

[DOI]

Dataset, December, 2022

LIKWID.

[BibT_eX]

[DOI]

Dataset, December, 2022

LIKWID.

[BibT_eX]

[DOI]

Dataset, August, 2022

Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2022

Analytic performance model for parallel overlapping memory-bound kernels.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2022

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2022

Addressing White-box Modeling and Simulation Challenges in Parallel Computing.

[BibT_eX]

[DOI]

Proceedings of the SIGSIM-PADS '22: SIGSIM Conference on Principles of Advanced Discrete Simulation, Atlanta, GA, USA, June 8, 2022

2021

LIKWID.

[BibT_eX]

[DOI]

Dataset, December, 2021

LIKWID.

[BibT_eX]

[DOI]

Dataset, November, 2021

LIKWID.

[BibT_eX]

[DOI]

Dataset, June, 2021

LIKWID.

[BibT_eX]

[DOI]

Dataset, June, 2021

RRZE-HPC/likwid: likwid-5.1.1.

[BibT_eX]

[DOI]

Dataset, March, 2021

A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials.

[BibT_eX]

[DOI]

Andreas Pieper

Int. J. High Perform. Comput. Appl., 2021

Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2021

ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX.

[BibT_eX]

[DOI]

CoRR, 2021

Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 36th International Conference, 2021

Opening the Black Box: Performance Estimation during Code Generation for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021

YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

RRZE-HPC/likwid: likwid-5.1.0.

[BibT_eX]

[DOI]

Dataset, November, 2020

ESSEX: Equipping Sparse Solvers For Exascale.

[BibT_eX]

[DOI]

Proceedings of the Software for Exascale Computing - SPPEXA 2016-2019, 2020

A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2020

PHIST: A Pipelined, Hybrid-Parallel Iterative Solver Toolkit.

[BibT_eX]

[DOI]

Jonas Thies

ACM Trans. Math. Softw., 2020

Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2020

Analytic performance modeling and analysis of detailed neuron simulations.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2020

An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs.

[BibT_eX]

[DOI]

CoRR, 2020

Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 35th International Conference, 2020

Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 35th International Conference, 2020

Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

2019

CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECT.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2019

Delay Propagation and Overlapping Mechanisms on Clusters: A Case Study of Idle Periods based on Workload, Communication, and Delay Granularity.

[BibT_eX]

[DOI]

CoRR, 2019

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs.

[BibT_eX]

[DOI]

CoRR, 2019

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernels on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2019

Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018

Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2018

Performance Engineering.

[BibT_eX]

[DOI]

Inform. Spektrum, 2018

Building and utilizing fault tolerance support tools for the GASPI applications.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2018

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs.

[BibT_eX]

[DOI]

CoRR, 2018

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 33rd International Conference, 2018

On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors.

[BibT_eX]

[DOI]

Johannes Hofmann

Dietmar Fey

Proceedings of the High Performance Computing - 33rd International Conference, 2018

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018

Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

2017

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems.

[BibT_eX]

[DOI]

Moritz Kreutzer

Jonas Thies

Int. J. Parallel Program., 2017

Validation of hardware events for successful performance pattern identification in High Performance Computing.

[BibT_eX]

[DOI]

CoRR, 2017

PVSC-DTM: A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials.

[BibT_eX]

[DOI]

Andreas Pieper

CoRR, 2017

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels.

[BibT_eX]

[DOI]

CoRR, 2017

Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 32nd International Conference, 2017

LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016

Towards an Exascale Enabled Sparse Solver Repository.

[BibT_eX]

[DOI]

Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations.

[BibT_eX]

[DOI]

J. Comput. Phys., 2016

Performance analysis of the Kahan-enhanced scalar product on current multi- and manycore processors.

[BibT_eX]

[DOI]

CoRR, 2016

Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2016

Exploring performance and power properties of modern multi-core chips via simple machine models.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2016

Performance and power for highly parallel systems.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2016

Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems - ARCS 2016, 2016

2015

Increasing the Performance of the Jacobi-Davidson Method by Blocking.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2015

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2015

Short Note on Costs of Floating Point Operations on current x86-64 Architectures: Denormals, Overflow, Underflow, and Division by Zero.

[BibT_eX]

[DOI]

CoRR, 2015

Multi-dimensional intra-tile parallelization for memory-starved stencil computations.

[BibT_eX]

[DOI]

CoRR, 2015

Performance analysis of the Kahan-enhanced scalar product on current multicore processors.

[BibT_eX]

[DOI]

CoRR, 2015

Automatic loop kernel analysis and performance modeling with Kerncraft.

[BibT_eX]

[DOI]

Proceedings of the 6th International Workshop on Performance Modeling, 2015

Performance Analysis of the Kahan-Enhanced Scalar Product on Current Multicore Processors.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2015

Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Building a Fault Tolerant Application Using the GASPI Communication Layer.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2014

Domain-Specific Optimization of Two Jacobi Smoother Kernels and Their Evaluation in the ECM Performance Model.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2014

Modeling and analyzing performance for highly optimized propagation steps of the lattice Boltzmann method on sparse lattices.

[BibT_eX]

[DOI]

CoRR, 2014

Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking.

[BibT_eX]

[DOI]

CoRR, 2014

Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems.

[BibT_eX]

[DOI]

CoRR, 2014

Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips.

[BibT_eX]

[DOI]

Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, 2014

Overhead Analysis of Performance Counter Measurements.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

ESSEX: Equipping Sparse Solvers for Exascale.

[BibT_eX]

[DOI]

Faisal Shahzad

Jonas Thies

Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator.

[BibT_eX]

[DOI]

Proceedings of the ARCS 2014, 2014

2013

A Survey of Checkpoint/Restart Techniques on Distributed Memory Systems.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2013

Pushing the limits for medical image reconstruction on recent standard multicore processors.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2013

An analysis of energy-optimized lattice-Boltzmann CFD simulations from the chip to the highly parallel level

[BibT_eX]

[DOI]

CoRR, 2013

Optimization of FASTEST-3D for Modern Multicore Systems

[BibT_eX]

[DOI]

CoRR, 2013

Asynchronous MPI for the Masses

[BibT_eX]

[DOI]

CoRR, 2013

A unified sparse matrix data format for modern processors with wide SIMD units.

[BibT_eX]

[DOI]

CoRR, 2013

Comparison of different propagation steps for lattice Boltzmann methods.

[BibT_eX]

[DOI]

Comput. Math. Appl., 2013

An Evaluation of Different I/O Techniques for Checkpoint/Restart.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Model-guided performance analysis of the sparse matrix-matrix multiplication.

[BibT_eX]

[DOI]

Proceedings of the International Conference on High Performance Computing & Simulation, 2013

2012

Expression Templates Revisited: A Performance Analysis of Current Methodologies.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2012

Exploring performance and power properties of modern multicore chips via simple machine models

[BibT_eX]

[DOI]

CoRR, 2012

Best practices for HPM-assisted performance engineering on modern multicore processors

[BibT_eX]

[DOI]

CoRR, 2012

Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

High performance smart expression template math libraries.

[BibT_eX]

[DOI]

Proceedings of the 2012 International Conference on High Performance Computing & Simulation, 2012

Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Performance Engineering: From Numbers to Insight.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

2011

Hybrid-Parallel Sparse Matrix-Vector Multiplication with Explicit Communication Overlap on Current Multicore-Based Systems.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2011

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters.

[BibT_eX]

[DOI]

Christian Feichtinger

Parallel Comput., 2011

Efficient multicore-aware parallelization strategies for iterative stencil computations.

[BibT_eX]

[DOI]

J. Comput. Sci., 2011

Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results

[BibT_eX]

[DOI]

Johannes Habich

Christian Feichtinger

Harald Köstler

CoRR, 2011

Domain decomposition and locality optimization for large-scale lattice Boltzmann simulations

[BibT_eX]

[DOI]

CoRR, 2011

Comparison of different Propagation Steps for the Lattice Boltzmann Method

[BibT_eX]

[DOI]

CoRR, 2011

Expression Templates Revisited: A Performance Analysis of the Current ET Methodology

[BibT_eX]

[DOI]

CoRR, 2011

Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems

[BibT_eX]

[DOI]

Markus Wittmann

CoRR, 2011

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA.

[BibT_eX]

[DOI]

Adv. Eng. Softw., 2011

Poster: LIKWID: lightweight performance tools.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

likwid-bench: An Extensible Microbenchmarking Platform for x86 Multicore Compute Nodes.

[BibT_eX]

[DOI]

Proceedings of the Tools for High Performance Computing 2011, 2011

Parallel Sparse Matrix-Vector Multiplication as a Test Case for Hybrid MPI+OpenMP Programming.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Introduction to High Performance Computing for Scientists and Engineers.

[BibT_eX]

[DOI]

Chapman and Hall / CRC computational science series, CRC Press, ISBN: 978-1-439-81192-4, 2011

2010

Leveraging Shared Caches for Parallel Temporal Blocking of Stencil Codes on Multicore Processors and Clusters.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2010

Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory.

[BibT_eX]

[DOI]

Markus Wittmann

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Parallel Processing, 2010

LIKWID: Lightweight Performance Tools.

[BibT_eX]

[DOI]

Proceedings of the Competence in High Performance Computing 2010, 2010

2009

Benchmark Analysis and Application Results for Lattice Boltzmann Simulations on NEC SX Vector and Intel Nehalem Systems.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2009

Multi-core architectures: Complexities of performance prediction and the impact of cache topology

[BibT_eX]

[DOI]

CoRR, 2009

Performance limitations for sparse matrix-vector multiplications on current multicore environments

[BibT_eX]

[DOI]

Gerald Schubert

CoRR, 2009

A Proof of Concept for Optimizing Task Parallelism by Locality Queues

[BibT_eX]

[DOI]

Markus Wittmann

CoRR, 2009

Introducing a Performance Model for Bandwidth-Limited Loop Kernels.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2009

Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes.

[BibT_eX]

[DOI]

Rolf Rabenseifner

Gabriele Jost

Proceedings of the 17th Euromicro International Conference on Parallel, 2009

The world's fastest CPU and SMP node: Some performance results from the NEC SX-9.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization.

[BibT_eX]

[DOI]

Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference, 2009

2008

Data Access Characteristics and Optimizations for Sun UltraSPARC T2 and T2+ Systems.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2008

Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Vector Computers in a World of Commodity Clusters, Massively Parallel Systems and Many-Core Many-Threaded CPUs: Recent Experience Based on an Advanced Lattice Boltzmann Flow Solver.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing in Science and Engineering '08, 2008

2007

RZBENCH: Performance evaluation of current HPC architectures using low-level and application benchmarks

[BibT_eX]

[DOI]