Simon McIntosh-Smith

Orcid: 0000-0002-5312-0378

Affiliations:
  • University of Bristol


According to our database1, Simon McIntosh-Smith authored at least 73 papers between 1994 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC.
CoRR, 2024

Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing Architectures.
Proceedings of the 12th International Workshop on OpenCL and SYCL, 2024

2023
An Empirical Comparison of the RISC-V and AArch64 Instruction Sets.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Time Machine: Generative Real-Time Model for Failure (and Lead Time) Prediction in HPC Systems.
Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Network, 2023

2022
An Initial Evaluation of Arm's Scalable Matrix Extension.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Evaluating ISO C++ Parallel Algorithms on Heterogeneous HPC Systems.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Heterogeneous Programming for the Homogeneous Majority.
Proceedings of the IEEE/ACM International Workshop on Performance, 2022

2021
Navigating Performance, Portability, and Productivity.
Comput. Sci. Eng., 2021

A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application.
Proceedings of the High Performance Computing - 36th International Conference, 2021

Applying Recent Machine Learning Approaches to Accelerate the Algebraic Multigrid Method for Fluid Simulations.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, 2021

Benchmarking and Extending SYCL Hierarchical Parallelism.
Proceedings of the IEEE/ACM International Workshop on Hierarchical Parallelism for Exascale Computing, 2021

Comparing Julia to Performance Portable Parallel Programming Models for HPC.
Proceedings of the 2021 International Workshop on Performance Modeling, 2021

Mamba: Portable Array-based Abstractions for Heterogeneous High-Performance Systems.
Proceedings of the International Workshop on Performance, 2021

Analyzing Reduction Abstraction Capabilities.
Proceedings of the International Workshop on Performance, 2021

On measuring the maturity of SYCL implementations by tracking historical performance improvements.
Proceedings of the IWOCL'21: International Workshop on OpenCL, Munich Germany, April, 2021, 2021

2020
Benchmarking the first generation of production quality Arm-based supercomputers.
Concurr. Comput. Pract. Exp., 2020

On the Use of BLAS Libraries in Modern Scientific Codes at Scale.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020

Interpreting and Visualizing Performance Portability Metrics.
Proceedings of the IEEE/ACM International Workshop on Performance, 2020

Tracking Performance Portability on the Yellow Brick Road to Exascale.
Proceedings of the IEEE/ACM International Workshop on Performance, 2020

Hostile Cache Implications for Small, Dense Linear Solves.
Proceedings of the IEEE/ACM Workshop on Memory Centric High Performance Computing, 2020

Enabling System Wide Shared Memory for Performance Improvement in PyCOMPSs Applications.
Proceedings of the 9th IEEE/ACM Workshop on Python for High-Performance and Scientific Computing, 2020

Evaluating the performance of HPC-style SYCL applications.
Proceedings of the IWOCL '20: International Workshop on OpenCL, 2020

Evaluating the Effectiveness of a Vector-Length-Agnostic Instruction Set.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

The Effects of Wide Vector Operations on Processor Caches.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
Exploiting Task Parallelism with OpenCL: A Case Study.
J. Signal Process. Syst., 2019

A performance analysis of the first generation of HPC-optimized Arm processors.
Concurr. Comput. Pract. Exp., 2019

Exploiting Hardware-Accelerated Ray Tracing for Monte Carlo Particle Transport with OpenMC.
Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019

Performance Portability across Diverse Computer Architectures.
Proceedings of the 2019 IEEE/ACM International Workshop on Performance, 2019

2018
Application-based fault tolerance techniques for sparse matrix solvers.
Int. J. High Perform. Comput. Appl., 2018

An improved parallelism scheme for deterministic discrete ordinates transport.
Int. J. High Perform. Comput. Appl., 2018

Evaluating attainable memory bandwidth of parallel programming models via BabelStream.
Int. J. Comput. Sci. Eng., 2018

Benchmarking the NVIDIA V100 GPU and Tensor Cores.
Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

ASPEN: An Efficient Algorithm for Data Redistribution Between Producer and Consumer Grids.
Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

Multi-precision convolutional neural networks on heterogeneous hardware.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

UnSNAP: A Mini-App for Exploring the Performance of Deterministic Discrete Ordinates Transport on Unstructured Meshes.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017
Assessing the performance portability of modern parallel programming models using TeaLeaf.
Concurr. Comput. Pract. Exp., 2017

Exploiting Auto-tuning to Analyze and Improve Performance Portability on Many-Core Architectures.
Proceedings of the High Performance Computing, 2017

On the Mitigation of Cache Hostile Memory Access Patterns on Many-Core CPU Architectures.
Proceedings of the High Performance Computing, 2017

A Survey of Application Memory Usage on a National Supercomputer: An Analysis of Memory Requirements on ARCHER.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2017

The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

On the Performance of Parallel Tasking Runtimes for an Irregular Fast Multipole Method Application.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

Analyzing and improving performance portability of OpenCL applications via auto-tuning.
Proceedings of the 5th International Workshop on OpenCL, 2017

Application-Based Fault Tolerance Techniques for Fully Protecting Sparse Matrix Solvers.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

TeaLeaf: A Mini-Application to Enable Design-Space Explorations for Iterative Sparse Linear Solvers.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

The Arch Project: Physics Mini-Apps for Algorithmic Exploration and Evaluating Programming Environments on HPC Architectures.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Exploring On-Node Parallelism with Neutral, a Monte Carlo Neutral Particle Transport Mini-App.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models.
Proceedings of the High Performance Computing, 2016

Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale.
Proceedings of the High Performance Computing - 31st International Conference, 2016

Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support.
Proceedings of the 7th International Workshop on Performance Modeling, 2016

Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer.
Proceedings of the International Conference for High Performance Computing, 2016

An Evaluation of Emerging Many-Core Parallel Programming Models.
Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016

Pragmatic Performance Portability with OpenMP 4.x.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015
High performance <i>in silico</i> virtual drug screening on many-core processors.
Int. J. High Perform. Comput. Appl., 2015

Symposium on Experiences of Porting and Optimising Code for Xeon Phi Processors.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models.
Proceedings of the IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2015

Oclgrind: an extensible OpenCL device simulator.
Proceedings of the 3rd International Workshop on OpenCL, 2015

Nano Simbox: an OpenCL-accelerated framework for interactive molecular dynamics.
Proceedings of the 3rd International Workshop on OpenCL, 2015

IWOCL: International Workshop on OpenCL.
Proceedings of the 3rd International Workshop on OpenCL, 2015

Exploiting Spatial Information in Datasets to Enable Fault Tolerant Sparse Matrix Solvers.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Expressing Parallelism on Many-Core for Deterministic Discrete Ordinates Transport.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures.
Proceedings of the Supercomputing - 29th International Conference, 2014

The OPS domain specific abstraction for multi-block structured grid computations.
Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2014

Evaluation of a performance portable lattice Boltzmann code using OpenCL.
Proceedings of the International Workshop on OpenCL, 2014

Porting a commercial application to OpenCL: a case study.
Proceedings of the International Workshop on OpenCL, 2014

2013
Special issue of the Journal of Parallel and Distributed Computing (JDPC) on novel architectures for high-performance computing.
J. Parallel Distributed Comput., 2013

2012
Benchmarking Energy Efficiency, Power Costs and Carbon Emissions on Heterogeneous Systems.
Comput. J., 2012

Accelerating Hydrocodes with OpenACC, OpeCL and CUDA.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

2011
Energy-aware metrics for benchmarking heterogeneous systems.
SIGMETRICS Perform. Evaluation Rev., 2011

2010
A massively multicore parallelization of the Kohn-Sham energy gradients.
J. Comput. Chem., 2010

2008
Parallel path tracing using incoherent path-atom binning.
Proceedings of the Spring Conference on Computer Graphics, 2008

1994
Intelligent Algorithm Decomposition for Parallelism.
Proceedings of the Massively Parallel Processing Applications and Develompent, 1994


  Loading...