Mohamed Wahib

Doga Dikbayir

Mehmet Esat Belviranli

Didem Unat

Parallel Comput., 2021

Structured Adaptive Mesh Refinement Adaptations to Retain Performance Portability With Increasing Heterogeneity.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.

[BibT_eX]

[DOI]

CoRR, 2021

Efficient MPI-AllReduce for large-scale deep learning on GPU-clusters.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2021

Scalable FBP decomposition for cone-beam CT reconstruction.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021

Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Performance portable back-projection algorithms on CPUs: agnostic data locality and vectorization optimizations.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Intra-page Cache Update in SLC-mode with Partial Programming in High Density SSDs.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks.

[BibT_eX]

[DOI]

Albert Njoroge Kahira

Leonardo Bautista-Gomez

Rosa M. Badia

Proceedings of the HPDC '21: The 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021

Domain-Specific Runtime to Orchestrate Computation on Heterogeneous Platforms.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

An Allreduce Algorithm and Network Co-design for Large-Scale Training of Distributed Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020

AIMES: Advanced Computation and I/O Methods for Earth-System Simulations.

[BibT_eX]

[DOI]

Proceedings of the Software for Exascale Computing - SPPEXA 2016-2019, 2020

Scaling distributed deep learning workloads beyond the memory capacity with KARMA.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

AN5D: automated stencil framework for high-degree temporal blocking on GPUs.

[BibT_eX]

[DOI]

Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019

iFDK: a scalable framework for instant high-resolution image reconstruction.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

A versatile software systolic execution model for GPU memory-bound kernels.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches?

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Topology-aware Sparse Allreduce for Large-scale Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Performance Computing and Communications Conference, 2019

2018

Hierarchical Distributed-Memory Multi-Leader MPI-Allreduce for Deep Learning Workloads.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on Computing and Networking, 2018

Efficient Algorithms for the Summed Area Tables Primitive on GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017

Numerical Optimization of ESA's Messenger Space Mission Benchmark.

[BibT_eX]

[DOI]

Martin Schlueter

Masaharu Munetomo

Proceedings of the Applications of Evolutionary Computation - 20th European Conference, 2017

2016

Daino: a high-level framework for parallel and efficient AMR on GPUs.

[BibT_eX]

[DOI]

Takayuki Aoki

Proceedings of the International Conference for High Performance Computing, 2016

2015

Data-centric GPU-based adaptive mesh refinement.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Irregular Applications - Architectures and Algorithms, 2015

Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications.

[BibT_eX]

[DOI]

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

2014

Scalable Kernel Fusion for Memory-Bound GPU Applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

2013

arGA: Adaptive Resolution Micro-genetic Algorithm with Tabu Search to Solve MINLP Problems Using GPU.

[BibT_eX]

[DOI]

Proceedings of the Massively Parallel Evolutionary Computation on GPGPUs, 2013

Highly optimized full GPU-acceleration of non-hydrostatic weather model SCALE-LES.

[BibT_eX]

[DOI]