Mohamed Wahib

Orcid: 0000-0002-7165-2095

According to our database1, Mohamed Wahib authored at least 67 papers between 2007 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Learning from the Past Training Trajectories: Regularization by Validation.
J. Adv. Comput. Intell. Intell. Informatics, January, 2024

CG-Kit: Code Generation Toolkit for Performant and Maintainable Variants of Source Code Applied to Flash-X Hydrodynamics Simulations.
CoRR, 2024

2023
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads.
ACM Trans. Archit. Code Optim., December, 2023

Simeuro: A Hybrid CPU-GPU Parallel Simulator for Neuromorphic Computing Chips.
IEEE Trans. Parallel Distributed Syst., October, 2023

Myths and legends in high-performance computing.
Int. J. High Perform. Comput. Appl., July, 2023

Ultra-Long Sequence Distributed Transformer.
CoRR, 2023

Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt).
CoRR, 2023

Training Knowledge Inheritance Through Deep Q-Net.
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2023

KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Revisiting Temporal Blocking Stencil Optimizations.
Proceedings of the 37th International Conference on Supercomputing, 2023

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications.
Proceedings of the 37th International Conference on Supercomputing, 2023

Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge.
Proceedings of the 37th International Conference on Supercomputing, 2023

2022
Automatic Generation of High-Performance Convolution Kernels on ARM CPUs for Deep Learning.
IEEE Trans. Parallel Distributed Syst., 2022

Flash-X: A multiphysics simulation software instrument.
SoftwareX, 2022

Preparing for the Future - Rethinking Proxy Applications.
Comput. Sci. Eng., 2022

Preparing for the Future - Rethinking Proxy Apps.
CoRR, 2022

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache.
CoRR, 2022

Persistent Kernels for Iterative Memory-bound GPU Applications.
CoRR, 2022

Learning from the Past: Regularization by Validation.
Proceedings of the Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on Advanced Intelligent Systems, 2022

Image Gradient Decomposition for Parallel and Memory-Efficient Ptychographic Reconstruction.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021
GTOPX space mission benchmarks.
SoftwareX, 2021

A computational-graph partitioning method for training memory-constrained DNNs.
Parallel Comput., 2021

Structured Adaptive Mesh Refinement Adaptations to Retain Performance Portability With Increasing Heterogeneity.
Comput. Sci. Eng., 2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.
CoRR, 2021

Efficient MPI-AllReduce for large-scale deep learning on GPU-clusters.
Concurr. Comput. Pract. Exp., 2021

Scalable FBP decomposition for cone-beam CT reconstruction.
Proceedings of the International Conference for High Performance Computing, 2021


Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Performance portable back-projection algorithms on CPUs: agnostic data locality and vectorization optimizations.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Intra-page Cache Update in SLC-mode with Partial Programming in High Density SSDs.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks.
Proceedings of the HPDC '21: The 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021

Domain-Specific Runtime to Orchestrate Computation on Heterogeneous Platforms.
Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

An Allreduce Algorithm and Network Co-design for Large-Scale Training of Distributed Deep Learning.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
AIMES: Advanced Computation and I/O Methods for Earth-System Simulations.
Proceedings of the Software for Exascale Computing - SPPEXA 2016-2019, 2020

Scaling distributed deep learning workloads beyond the memory capacity with KARMA.
Proceedings of the International Conference for High Performance Computing, 2020

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

AN5D: automated stencil framework for high-degree temporal blocking on GPUs.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019
iFDK: a scalable framework for instant high-resolution image reconstruction.
Proceedings of the International Conference for High Performance Computing, 2019

A versatile software systolic execution model for GPU memory-bound kernels.
Proceedings of the International Conference for High Performance Computing, 2019

Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches?
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Topology-aware Sparse Allreduce for Large-scale Deep Learning.
Proceedings of the 38th IEEE International Performance Computing and Communications Conference, 2019

2018
Hierarchical Distributed-Memory Multi-Leader MPI-Allreduce for Deep Learning Workloads.
Proceedings of the Sixth International Symposium on Computing and Networking, 2018

Efficient Algorithms for the Summed Area Tables Primitive on GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017
Numerical Optimization of ESA's Messenger Space Mission Benchmark.
Proceedings of the Applications of Evolutionary Computation - 20th European Conference, 2017

2016
Daino: a high-level framework for parallel and efficient AMR on GPUs.
Proceedings of the International Conference for High Performance Computing, 2016

2015
Data-centric GPU-based adaptive mesh refinement.
Proceedings of the 5th Workshop on Irregular Applications - Architectures and Algorithms, 2015

Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

2014
Scalable Kernel Fusion for Memory-Bound GPU Applications.
Proceedings of the International Conference for High Performance Computing, 2014

2013
arGA: Adaptive Resolution Micro-genetic Algorithm with Tabu Search to Solve MINLP Problems Using GPU.
Proceedings of the Massively Parallel Evolutionary Computation on GPGPUs, 2013

Highly optimized full GPU-acceleration of non-hydrostatic weather model SCALE-LES.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2011
A Framework for Cloud Embedded Web Services Utilized by Cloud Applications.
Proceedings of the World Congress on Services, 2011

Solving Extremely Difficult MINLP Problems Using Adaptive Resolution Micro-GA with Tabu Search.
Proceedings of the Learning and Intelligent Optimization - 5th International Conference, 2011

Optimization of parallel Genetic Algorithms for nVidia GPUs.
Proceedings of the IEEE Congress on Evolutionary Computation, 2011

Advanced genetic algorithm to solve MINLP problems over GPU.
Proceedings of the IEEE Congress on Evolutionary Computation, 2011

2010
The design, usage, and performance of GridUFO: A Grid based Unified Framework for Optimization.
Future Gener. Comput. Syst., 2010

A Light Framework for the Unified Representation and Execution of Variant Tasks in a Grid Based Environment.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2010

A Bayesian Optimization Algorithm for De Novo ligand design based docking running over GPU.
Proceedings of the IEEE Congress on Evolutionary Computation, 2010

2009
Hybrid of genetic algorithm and local search to solve MAX-SAT problem using nVidia CUDA framework.
Genet. Program. Evolvable Mach., 2009

Theoretical and Empirical Analysis of a GPU Based Parallel Bayesian Optimization Algorithm.
Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

2008
Parallel GEAs with Linkage Analysis over Grid.
Proceedings of the Linkage in Evolutionary Computation, 2008

A Survey: Genetic Algorithms and the Fast Evolving World of Parallel Computing.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications, 2008

Solving Large Instances of Capacitated Vehicle Routing Problem over Cell BE.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications, 2008

SOAG: Service Oriented Architectured Grids and adoption of application specific QoS attributes.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

Model for dynamic grain sizing through compound parallelization for an optimization problem solving grid application.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

A General Service-Oriented Grid Computing Framework for Global Optimization Problem Solving.
Proceedings of the 2008 IEEE International Conference on Services Computing (SCC 2008), 2008

2007
MHGrid: Towards an Ideal Optimization Environment for Global Optimization Problems Using Grid Computing.
Proceedings of the Eighth International Conference on Parallel and Distributed Computing, 2007


  Loading...