Hatem Ltaief

Francisco E. Hernández Pérez

Minh Bau Luong

Hong G. Im

Proceedings of the Euro-Par 2022: Parallel Processing, 2022

2021

High Performance Multivariate Geospatial Statistics on Manycore Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Accelerating Seismic Redatuming Using Tile Low-Rank Approximations on NEC SX-Aurora TSUBASA.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2021

High-Performance Partial Spectrum Computation for Symmetric eigenvalue problems and the SVD.

[BibT_eX]

[DOI]

CoRR, 2021

Meeting the real-time challenges of ground-based telescopes using low-rank matrix computations.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt Method.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2021: Parallel Processing, 2021

2020

Abstraction Layer For Standardizing APIs of Task-Based Engines.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Asynchronous computations for solving the acoustic wave propagation equation.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2020

High Performance Multivariate Spatial Modeling for Geostatistical Data on Manycore Systems.

[BibT_eX]

[DOI]

CoRR, 2020

Performance / Complexity Trade-offs of the Sphere Decoder Algorithm for Massive MIMO Systems.

[BibT_eX]

[DOI]

Adel Dabah

Zouheir Rezki

Mohamed Amine Arfaoui

Mohamed-Slim Alouini

CoRR, 2020

Solving Acoustic Boundary Integral Equations Using High Performance Tile Low-Rank LU Factorization.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 35th International Conference, 2020

Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications.

[BibT_eX]

[DOI]

Proceedings of the PASC '20: Platform for Advanced Scientific Computing Conference, Geneva, Switzerland, June 29, 2020

Maximizing I/O Bandwidth for Reverse Time Migration on Heterogeneous Large-Scale Systems.

[BibT_eX]

[DOI]

Tariq Alturkestani

Proceedings of the Euro-Par 2020: Parallel Processing, 2020

2019

Massively Parallel Polar Decomposition on Distributed-memory Systems.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2019

A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2019

Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2019

ExaGeoStatR: A Package for Large-Scale Geostatistics in R.

[BibT_eX]

[DOI]

CoRR, 2019

Mixed-Precision Tomographic Reconstructor Computations on Hardware Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2019

Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

MLBS: Transparent Data Caching in Hierarchical Storage for Out-of-Core HPC Applications.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Geostatistical Modeling and Prediction Using Mixed Precision Tile Cholesky Factorization.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018

Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2018

Accelerated Cyclic Reduction: A distributed-memory fast solver for structured linear systems.

[BibT_eX]

[DOI]

Parallel Comput., 2018

Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression.

[BibT_eX]

[DOI]

Parallel Comput., 2018

Tile Low-Rank Approximation of Large-Scale Maximum Likelihood Estimation on Manycore Architectures.

[BibT_eX]

[DOI]

CoRR, 2018

Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System.

[BibT_eX]

[DOI]

Proceedings of the Platform for Advanced Scientific Computing Conference, 2018

Real-Time Massively Distributed Multi-object Adaptive Optics Simulations for the European Extremely Large Telescope.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Tile Low-Rank GEMM Using Batched Operations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Exploiting Data Sparsity for Large-Scale Matrix Computations.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017

Trends in Data Locality Abstractions for HPC Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

ExaGeoStat: A High Performance Unified Framework for Geostatistics on Manycore Systems.

[BibT_eX]

[DOI]

CoRR, 2017

A framework for dense triangular matrix kernels on various manycore architectures.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 32nd International Conference, 2017

2016

A High Performance QDWH-SVD Solver Using Hardware Accelerators.

[BibT_eX]

[DOI]

Dalal Sukkari

ACM Trans. Math. Softw., 2016

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators.

[BibT_eX]

[DOI]

Ahmad Abdelfattah

ACM Trans. Math. Softw., 2016

Accelerated Dimension-Independent Adaptive Metropolis.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2016

Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2016

Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs.

[BibT_eX]

[DOI]

Proceedings of the Platform for Advanced Scientific Computing Conference, 2016

Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Efficient Sphere Detector Algorithm for Massive MIMO using GPU Hardware Accelerator.

[BibT_eX]

[DOI]

Mohamed Amine Arfaoui

Proceedings of the International Conference on Computational Science 2016, 2016

High Performance Polar Decomposition on Distributed Memory Systems.

[BibT_eX]

[DOI]

Dalal Sukkari

Proceedings of the Euro-Par 2016: Parallel Processing, 2016

Redesigning Triangular Dense Matrix Computations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015

Dense Matrix Computations on NUMA Architectures with Distance-Aware Work Stealing.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2015

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2015

Multi-dimensional intra-tile parallelization for memory-starved stencil computations.

[BibT_eX]

[DOI]

CoRR, 2015

High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications.

[BibT_eX]

[DOI]

Ahmad Abdelfattah

Proceedings of the Euro-Par 2015: Parallel Processing, 2015

2014

Power profiling of Cholesky and QR factorizations on distributed memory systems.

[BibT_eX]

[DOI]

George Bosilca

Comput. Sci. Res. Dev., 2014

Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking.

[BibT_eX]

[DOI]

CoRR, 2014

Data-driven execution of fast multipole methods.

[BibT_eX]

[DOI]

Rio Yokota

Concurr. Comput. Pract. Exp., 2014

Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2014

Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2014 Parallel Processing, 2014

2013

High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2013

2012

Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem.

[BibT_eX]

[DOI]

Azzam Haidar

SIAM J. Sci. Comput., 2012

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2012

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science, 2012

Poster: Matrices over Runtime Systems at Exascale.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Matrices Over Runtime Systems at Exascale.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU.

[BibT_eX]

[DOI]

Ahmad Abdelfattah

Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012

2011

Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2011

Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels.

[BibT_eX]

[DOI]

Azzam Haidar

Proceedings of the Conference on High Performance Computing Networking, 2011

High performance matrix inversion based on LU factorization for multicore architectures.

[BibT_eX]

[DOI]

Proceedings of the 2011 ACM International Workshop on Many Task Computing on Grids and Supercomputers, 2011

Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2011

Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Exploiting Fine-Grain Parallelism in Recursive LU Factorization.

[BibT_eX]

[DOI]

Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

LU factorization for accelerator-based systems.

[BibT_eX]

[DOI]

Proceedings of the 9th IEEE/ACS International Conference on Computer Systems and Applications, 2011

2010

Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures.

[BibT_eX]

[DOI]

Jakub Kurzak

IEEE Trans. Parallel Distributed Syst., 2010

Scheduling two-sided transformations using tile algorithms on multicore architectures.

[BibT_eX]

[DOI]

Sci. Program., 2010

Scheduling dense linear algebra operations on multicore processors.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2010

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Dense linear algebra solvers for multicore with GPU accelerators.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Tile QR factorization with parallel panel processing for multicore architectures.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

2009

A parallel Aitken-additive Schwarz waveform relaxation suitable for the grid.

[BibT_eX]

[DOI]

Marc Garbey

Parallel Comput., 2009

Comparative study of one-sided factorizations with multiple software packages on multi-core hardware.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

2008

Fault tolerant algorithms for heat transfer problems.

[BibT_eX]

[DOI]

Edgar Gabriel

Marc Garbey

J. Parallel Distributed Comput., 2008

Scheduling for Numerical Linear Algebra Library at Scale.

[BibT_eX]

[DOI]

Proceedings of the High Speed and Large Scale Scientific Computing - Selected Papers from the High Performance Computing Workshop, Cetraro, Italy, June 30, 2008

2006

Parallel Fault Tolerant Algorithms for Parabolic Problems.

[BibT_eX]

[DOI]