Simon McIntosh-Smith

CoRR, 2024

Assessing the GPU Offload Threshold of GEMM and GEMV Kernels on Modern Heterogeneous HPC Systems.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

AI-Assisted Design-Space Analysis of High-Performance Arm Processors.

[BibT_eX]

[DOI]

Joseph Moore

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

A Metric for HPC Programming Model Productivity.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Federated Single Sign-On and Zero Trust Co-design for AI and HPC Digital Research Infrastructures.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing Architectures.

[BibT_eX]

[DOI]

Proceedings of the 12th International Workshop on OpenCL and SYCL, 2024

Isambard-AI: a leadership-class supercomputer optimised specifically for Artificial Intelligence.

[BibT_eX]

[DOI]

Sadaf R. Alam

Christopher J. Woods

Proceedings of the Cray User Group, 2024

2023

An Empirical Comparison of the RISC-V and AArch64 Instruction Sets.

[BibT_eX]

[DOI]

Daniel Weaver

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Time Machine: Generative Real-Time Model for Failure (and Lead Time) Prediction in HPC Systems.

[BibT_eX]

[DOI]

Khalid Ayedh Alharthi

Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Network, 2023

2022

An Initial Evaluation of Arm's Scalable Matrix Extension.

[BibT_eX]

[DOI]

Finn Wilkinson

Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Evaluating ISO C++ Parallel Algorithms on Heterogeneous HPC Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Heterogeneous Programming for the Homogeneous Majority.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Performance, 2022

2021

Navigating Performance, Portability, and Productivity.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2021

A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application.

[BibT_eX]

[DOI]

Andrei Poenaru

Proceedings of the High Performance Computing - 36th International Conference, 2021

Applying Recent Machine Learning Approaches to Accelerate the Algebraic Multigrid Method for Fluid Simulations.

[BibT_eX]

[DOI]

Thorben Louw

Proceedings of the Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, 2021

Benchmarking and Extending SYCL Hierarchical Parallelism.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Hierarchical Parallelism for Exascale Computing, 2021

Comparing Julia to Performance Portable Parallel Programming Models for HPC.

[BibT_eX]

[DOI]

Proceedings of the 2021 International Workshop on Performance Modeling, 2021

Mamba: Portable Array-based Abstractions for Heterogeneous High-Performance Systems.

[BibT_eX]

[DOI]

Proceedings of the International Workshop on Performance, 2021

Analyzing Reduction Abstraction Capabilities.

[BibT_eX]

[DOI]

Proceedings of the International Workshop on Performance, 2021

On measuring the maturity of SYCL implementations by tracking historical performance improvements.

[BibT_eX]

[DOI]

Proceedings of the IWOCL'21: International Workshop on OpenCL, Munich Germany, April, 2021, 2021

2020

Benchmarking the first generation of production quality Arm-based supercomputers.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2020

On the Use of BLAS Libraries in Modern Scientific Codes at Scale.

[BibT_eX]

[DOI]

Harry Waugh

Richard P. Smedley-Stevenson

Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020

Interpreting and Visualizing Performance Portability Metrics.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Performance, 2020

Tracking Performance Portability on the Yellow Brick Road to Exascale.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Performance, 2020

Hostile Cache Implications for Small, Dense Linear Solves.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM Workshop on Memory Centric High Performance Computing, 2020

Enabling System Wide Shared Memory for Performance Improvement in PyCOMPSs Applications.

[BibT_eX]

[DOI]

Proceedings of the 9th IEEE/ACM Workshop on Python for High-Performance and Scientific Computing, 2020

Evaluating the performance of HPC-style SYCL applications.

[BibT_eX]

[DOI]

Proceedings of the IWOCL '20: International Workshop on OpenCL, 2020

Evaluating the Effectiveness of a Vector-Length-Agnostic Instruction Set.

[BibT_eX]

[DOI]

Andrei Poenaru

Proceedings of the Euro-Par 2020: Parallel Processing, 2020

The Effects of Wide Vector Operations on Processor Caches.

[BibT_eX]

[DOI]

Andrei Poenaru

Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019

Exploiting Task Parallelism with OpenCL: A Case Study.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2019

A performance analysis of the first generation of HPC-optimized Arm processors.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2019

Exploiting Hardware-Accelerated Ray Tracing for Monte Carlo Particle Transport with OpenMC.

[BibT_eX]

[DOI]

Justin Salmon

Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019

Performance Portability across Diverse Computer Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/ACM International Workshop on Performance, 2019

2018

Application-based fault tolerance techniques for sparse matrix solvers.

[BibT_eX]

[DOI]

Rob Hunt

Alex Warwick Vesztrocy

Int. J. High Perform. Comput. Appl., 2018

An improved parallelism scheme for deterministic discrete ordinates transport.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2018

Evaluating attainable memory bandwidth of parallel programming models via BabelStream.

[BibT_eX]

[DOI]

Int. J. Comput. Sci. Eng., 2018

Benchmarking the NVIDIA V100 GPU and Tensor Cores.

[BibT_eX]

[DOI]

Patrick Atkinson

Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

ASPEN: An Efficient Algorithm for Data Redistribution Between Producer and Consumer Grids.

[BibT_eX]

[DOI]

Clément Foyer

Adrian Tate

Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

Multi-precision convolutional neural networks on heterogeneous hardware.

[BibT_eX]

[DOI]

Sam Amiri

Mohammad Hosseinabady

José L. Núñez-Yáñez

Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

UnSNAP: A Mini-App for Exploring the Performance of Deterministic Discrete Ordinates Transport on Unstructured Meshes.

[BibT_eX]

[DOI]

Richard P. Smedley-Stevenson

Justin Lovegrove

Andrew Hagues

Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017

Assessing the performance portability of modern parallel programming models using TeaLeaf.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

Exploiting Auto-tuning to Analyze and Improve Performance Portability on Many-Core Architectures.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2017

On the Mitigation of Cache Hostile Memory Access Patterns on Many-Core CPU Architectures.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2017

A Survey of Application Memory Usage on a National Supercomputer: An Analysis of Memory Requirements on ARCHER.

[BibT_eX]

[DOI]

Andy Turner

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2017

The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs.

[BibT_eX]

[DOI]

Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

On the Performance of Parallel Tasking Runtimes for an Irregular Fast Multipole Method Application.

[BibT_eX]

[DOI]

Patrick Atkinson

Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

Analyzing and improving performance portability of OpenCL applications via auto-tuning.

[BibT_eX]

[DOI]

Richard P. Smedley-Stevenson

Proceedings of the 5th International Workshop on OpenCL, 2017

Application-Based Fault Tolerance Techniques for Fully Protecting Sparse Matrix Solvers.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

TeaLeaf: A Mini-Application to Enable Design-Space Explorations for Iterative Sparse Linear Solvers.

[BibT_eX]

[DOI]

David Beckingsale

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

The Arch Project: Physics Mini-Apps for Algorithmic Exploration and Evaluating Programming Environments on HPC Architectures.

[BibT_eX]

[DOI]

Matthew Martineau

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Exploring On-Node Parallelism with Neutral, a Monte Carlo Neutral Particle Transport Mini-App.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2016

Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale.

[BibT_eX]

[DOI]

Alexandre E. Eichenberger

Proceedings of the High Performance Computing - 31st International Conference, 2016

Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support.

[BibT_eX]

[DOI]

Gheorghe-Teodor Bercea

Proceedings of the 7th International Workshop on Performance Modeling, 2016

Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer.

[BibT_eX]

[DOI]

Leonardo Bautista-Gomez

Ferad Zyulkyarov

Osman S. Unsal

Proceedings of the International Conference for High Performance Computing, 2016

An Evaluation of Emerging Many-Core Parallel Programming Models.

[BibT_eX]

[DOI]

Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016

Pragmatic Performance Portability with OpenMP 4.x.

[BibT_eX]

[DOI]

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015

High performance <i>in silico</i> virtual drug screening on many-core processors.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2015

Symposium on Experiences of Porting and Optimising Code for Xeon Phi Processors.

[BibT_eX]

[DOI]

Proceedings of the Parallel Computing: On the Road to Exascale, 2015

Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2015

Oclgrind: an extensible OpenCL device simulator.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Workshop on OpenCL, 2015

Nano Simbox: an OpenCL-accelerated framework for interactive molecular dynamics.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Workshop on OpenCL, 2015

IWOCL: International Workshop on OpenCL.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Workshop on OpenCL, 2015

Exploiting Spatial Information in Datasets to Enable Fault Tolerant Sparse Matrix Solvers.

[BibT_eX]

[DOI]

Rob Hunt

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Expressing Parallelism on Many-Core for Deterministic Discrete Ordinates Transport.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 29th International Conference, 2014

The OPS domain specific abstraction for multi-block structured grid computations.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2014

Evaluation of a performance portable lattice Boltzmann code using OpenCL.

[BibT_eX]

[DOI]