We stand with Ukraine

We stand with Ukraine

Azzam Haidar

Orcid: 0000-0002-3177-2084

According to our database¹, Azzam Haidar authored at least 81 papers between 2008 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Accelerating Supercomputing: AI-Hardware-Driven Innovation for Speed and Efficiency.

[DOI]

Jack J. Dongarra

,

John A. Gunnels

,

Harun Bayraktar

,

,

Proceedings of the IEEE High Performance Extreme Computing Conference, 2025

2024

Hardware Trends Impacting Floating-Point Computations In Scientific Applications.

[DOI]

Jack J. Dongarra

,

John A. Gunnels

,

Harun Bayraktar

,

,

CoRR, 2024

2023

cuQuantum SDK: A High-Performance Library for Accelerating Quantum Science.

[DOI]

Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

2022

Performance Analysis of Parallel FFT on Large Multi-GPU Systems.

[DOI]

,

,

Miroslav Stoyanov

,

,

Jack J. Dongarra

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

2021

A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines.

[DOI]

Ahmad Abdelfattah

,

Timothy B. Costa

,

Jack J. Dongarra

,

,

,

Sven Hammarling

,

Nicholas J. Higham

,

,

,

Stanimire Tomov

,

ACM Trans. Math. Softw., 2021

Accelerating Multi - Process Communication for Parallel 3-D FFT.

[DOI]

,

,

Miroslav Stoyanov

,

,

Jack J. Dongarra

Proceedings of the Workshop on Exascale MPI, 2021

2020

MAGMA templates for scalable linear algebra on emerging architectures.

[DOI]

Mohammed A. Al Farhan

,

Ahmad Abdelfattah

,

Stanimire Tomov

,

,

,

,

Robert Rosenberg

,

Jack J. Dongarra

Int. J. High Perform. Comput. Appl., 2020

heFFTe: Highly Efficient FFT for Exascale.

[DOI]

,

Stanimire Tomov

,

,

Jack J. Dongarra

Proceedings of the Computational Science - ICCS 2020, 2020

2019

PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP.

[DOI]

Jack J. Dongarra

,

,

,

,

,

,

Ichitaro Yamazaki

,

,

Maksims Abalenkovs

,

Negin Bagherpour

,

Sven Hammarling

,

,

,

,

Samuel D. Relton

ACM Trans. Math. Softw., 2019

Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices.

[DOI]

,

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

,

,

Jack J. Dongarra

Parallel Comput., 2019

Evaluation of directive-based performance portable programming models.

[DOI]

M. Graham Lopez

,

,

Verónica G. Vergara Larrea

,

Oscar R. Hernandez

,

,

Stanimire Tomov

,

Jack J. Dongarra

Int. J. High Perform. Comput. Netw., 2019

Investigating power capping toward energy-efficient scientific applications.

[DOI]

,

,

,

,

Stanimire Tomov

,

Jack J. Dongarra

Concurr. Comput. Pract. Exp., 2019

2018

A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations.

[DOI]

,

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

Jack J. Dongarra

IEEE Trans. Parallel Distributed Syst., 2018

Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs.

[DOI]

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

Jack J. Dongarra

IEEE Trans. Parallel Distributed Syst., 2018

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale.

[DOI]

Jack J. Dongarra

,

,

,

,

,

Stanimire Tomov

,

Ichitaro Yamazaki

SIAM Rev., 2018

Accelerating the SVD bi-diagonalization of a batch of small matrices using GPUs.

[DOI]

,

,

Stanimire Tomov

,

Jack J. Dongarra

J. Comput. Sci., 2018

Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures.

[DOI]

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

Jack J. Dongarra

J. Comput. Sci., 2018

Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers.

[DOI]

,

Stanimire Tomov

,

Jack J. Dongarra

,

Nicholas J. Higham

Proceedings of the International Conference for High Performance Computing, 2018

The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques.

[DOI]

,

Ahmad Abdelfattah

,

,

,

Srikara Pranesh

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the Computational Science - ICCS 2018, 2018

Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization.

[DOI]

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

2017

Fast Cholesky factorization on GPUs for batch and native modes in MAGMA.

[DOI]

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

Jack J. Dongarra

J. Comput. Sci., 2017

With Extreme Computing, the Rules Have Changed.

[DOI]

Jack J. Dongarra

,

Stanimire Tomov

,

,

,

,

Ichitaro Yamazaki

,

,

,

Ahmad Abdelfattah

Comput. Sci. Eng., 2017

A Framework for Out of Memory SVD Algorithms.

[DOI]

,

,

Stanimire Tomov

,

Aurélien Bouteiller

,

Jack J. Dongarra

Proceedings of the High Performance Computing - 32nd International Conference, 2017

Investigating half precision arithmetic to accelerate dense linear system solvers.

[DOI]

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017

High-performance Cholesky factorization for GPU-only execution.

[DOI]

,

Ahmad Abdelfattah

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the General Purpose GPUs, 2017

Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs.

[DOI]

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the International Conference on Supercomputing, 2017

Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices.

[DOI]

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the International Conference on Computational Science, 2017

Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures.

[DOI]

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the International Conference on Computational Science, 2017

Out of memory SVD solver for big data.

[DOI]

,

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi.

[DOI]

,

,

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

2016

Linear algebra software for large-scale accelerated multicore computing.

[DOI]

Ahmad Abdelfattah

,

,

Jack J. Dongarra

,

,

,

,

,

Stanimire Tomov

,

Ichitaro Yamazaki

,

Acta Numer., 2016

Performance, Design, and Autotuning of Batched GEMM for GPUs.

[DOI]

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the High Performance Computing - 31st International Conference, 2016

Towards Achieving Performance Portability Using Directives for Accelerators.

[DOI]

M. Graham Lopez

,

Verónica G. Vergara Larrea

,

,

Oscar R. Hernandez

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the Third Workshop on Accelerator Programming Using Directives, 2016

Heterogeneous Streaming.

[DOI]

Chris J. Newburn

,

,

,

,

,

Alejandro Duran

,

,

Leonardo Borges

,

,

Stanimire Tomov

,

Jack J. Dongarra

,

,

,

,

,

,

Ichitaro Yamazaki

,

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures.

[DOI]

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs.

[DOI]

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the International Conference on Computational Science 2016, 2016

High-Performance Tensor Contractions for GPUs.

[DOI]

Ahmad Abdelfattah

,

,

,

Jack J. Dongarra

,

Christopher W. Earl

,

,

,

,

Tzanio V. Kolev

,

,

Stanimire Tomov

Proceedings of the International Conference on Computational Science 2016, 2016

LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi.

[DOI]

,

Stanimire Tomov

,

Konstantin Arturov

,

Murat Efe Guney

,

,

Jack J. Dongarra

Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations.

[DOI]

,

,

Stanimire Tomov

,

,

Jay Jay Billings

,

,

Jack J. Dongarra

Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

High-Performance Matrix-Matrix Multiplications of Very Small Matrices.

[DOI]

,

Ahmad Abdelfattah

,

,

Stanimire Tomov

,

,

,

Jack J. Dongarra

Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015

Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems.

[DOI]

Jack J. Dongarra

,

Maksims Abalenkovs

,

Ahmad Abdelfattah

,

,

,

,

,

Stanimire Tomov

,

Ichitaro Yamazaki

,

Supercomput. Front. Innov., 2015

HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.

[DOI]

Jack J. Dongarra

,

,

,

,

,

,

Stanimire Tomov

Sci. Program., 2015

On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors.

[DOI]

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the High Performance Computing - 30th International Conference, 2015

A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations.

[DOI]

,

Tingxing Tim Dong

,

Stanimire Tomov

,

,

Jack J. Dongarra

Proceedings of the High Performance Computing - 30th International Conference, 2015

Performance analysis and design of a hessenberg reduction using stabilized blocked elementary transformations for new architectures.

[DOI]

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the Symposium on High Performance Computing, 2015

Efficient implementation of quantum materials simulations on distributed CPU-GPU systems.

[DOI]

Raffaele Solcà

,

Anton Kozhevnikov

,

,

Stanimire Tomov

,

Jack J. Dongarra

,

Thomas C. Schulthess

Proceedings of the International Conference for High Performance Computing, 2015

Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators.

[DOI]

,

,

,

Stanimire Tomov

,

,

Jack J. Dongarra

Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015

Optimization for performance and energy for batched matrix computations on GPUs.

[DOI]

,

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Towards batched linear solvers on accelerated hardware platforms.

[DOI]

,

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Divide and Conquer Symmetric Tridiagonal Eigensolver for Multicore Architectures.

[DOI]

Gregoire Pichon

,

,

Mathieu Faverge

,

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Performance Analysis and Optimisation of Two-sided Factorization Algorithms for Heterogeneous Platform.

[DOI]

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the International Conference on Computational Science, 2015

MAGMA embedded: Towards a dense linear algebra library for energy efficient extreme computing.

[DOI]

,

Stanimire Tomov

,

,

Jack J. Dongarra

Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015

Flexible Linear Algebra Development and Scheduling with Cholesky Factorization.

[DOI]

,

,

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

2014

Model-Driven One-Sided Factorizations on Multicore Accelerated Systems.

[DOI]

Jack J. Dongarra

,

,

,

,

Stanimire Tomov

,

Supercomput. Front. Innov., 2014

A novel hybrid CPU-GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks.

[DOI]

,

Stanimire Tomov

,

Jack J. Dongarra

,

Raffaele Solcà

,

Thomas C. Schulthess

Int. J. High Perform. Comput. Appl., 2014

Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments.

[DOI]

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Accelerating Computation of Eigenvectors in the Dense Nonsymmetric Eigenvalue Problem.

[DOI]

,

,

Jack J. Dongarra

Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors.

[DOI]

,

,

,

,

Stanimire Tomov

,

Ichitaro Yamazaki

,

Jack J. Dongarra

Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014

New Algorithm for Computing Eigenvectors of the Symmetric Eigenvalue Problem.

[DOI]

,

,

Jack J. Dongarra

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment.

[DOI]

,

,

,

,

Stanimire Tomov

,

,

Jack J. Dongarra

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

A Fast Batched Cholesky Factorization on a GPU.

[DOI]

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the 43rd International Conference on Parallel Processing, 2014

LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU.

[DOI]

,

,

,

James Austin Harris

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Accelerating Numerical Dense Linear Algebra Calculations with GPUs.

[DOI]

Jack J. Dongarra

,

,

,

,

,

Stanimire Tomov

,

Ichitaro Yamazaki

Proceedings of the Numerical Computations with GPUs, 2014

2013

Parallel algebraic domain decomposition solver for the solution of augmented systems.

[DOI]

Emmanuel Agullo

,

,

Abdou Guermouche

,

,

Adv. Eng. Softw., 2013

Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations.

[DOI]

,

Raffaele Solcà

,

,

Stanimire Tomov

,

Thomas C. Schulthess

,

Jack J. Dongarra

Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

An improved parallel singular value algorithm and its implementation for multicore hardware.

[DOI]

,

,

Proceedings of the International Conference for High Performance Computing, 2013

Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.

[DOI]

Jack J. Dongarra

,

,

,

,

,

,

Stanimire Tomov

Proceedings of the Parallel Processing and Applied Mathematics, 2013

Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication.

[DOI]

,

,

Stanimire Tomov

,

Jack J. Dongarra

Proceedings of the International Conference on Supercomputing, 2013

2012

Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem.

[DOI]

,

,

Jack J. Dongarra

SIAM J. Sci. Comput., 2012

A hybrid Hermitian general eigenvalue solver

[DOI]

Raffaele Solcà

,

Thomas C. Schulthess

,

,

Stanimire Tomov

,

Ichitaro Yamazaki

,

Jack J. Dongarra

CoRR, 2012

Poster: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks.

[DOI]

Raffaele Solcà

,

,

Stanimire Tomov

,

Thomas C. Schulthess

,

Jack J. Dongarra

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks.

[DOI]

Raffaele Solcà

,

,

Stanimire Tomov

,

Thomas C. Schulthess

,

Jack J. Dongarra

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction.

[DOI]

,

,

,

Jack J. Dongarra

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011

Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures.

[DOI]

,

,

,

Jack J. Dongarra

Concurr. Comput. Pract. Exp., 2011

Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels.

[DOI]

,

,

Jack J. Dongarra

Proceedings of the Conference on High Performance Computing Networking, 2011

Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures.

[DOI]

,

,

,

Jack J. Dongarra

Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.

[DOI]

,

Aurélien Bouteiller

,

Anthony Danalis

,

Mathieu Faverge

,

,

Thomas Hérault

,

,

,

Pierre Lemarinier

,

,

,

,

Jack J. Dongarra

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010

Using multiple levels of parallelism to enhance the performance of domain decomposition solvers.

[DOI]

,

,

Parallel Comput., 2010

2009

Parallel algebraic hybrid solvers for large 3D convection-diffusion problems.

[DOI]

,

Numer. Algorithms, 2009

2008

On the parallel scalability of hybrid linear solvers for large 3D problems. (Sur l'extensibilité parallèle de solveurs linéaires hybrides pour des problèmes tridimensionels de grandes tailles).

[DOI]

PhD thesis, 2008

Parallel scalability study of hybrid preconditioners in three dimensions.

[DOI]

,

,

Layne T. Watson

Parallel Comput., 2008

Loading...