Mark Gates

Ahmad Abdelfattah

Kadir Akbudak

Int. J. High Perform. Comput. Appl., 2025

SpikeRL: A Scalable and Energy-efficient Framework for Deep Spiking Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Neuromorphic Systems, 2025

2024

Interface for Sparse Linear Algebra Operations.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Scalable and Efficient Spiking Reinforcement Learning for Continuous Control Tasks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Neuromorphic Systems, 2024

2023

Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators.

[BibT_eX]

[DOI]

Dalal Sukkari

Hartwig Anzt

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

PAQR: Pivoting Avoiding QR factorization.

[BibT_eX]

[DOI]

Wissam M. Sid-Lakhdar

David B. Williams-Young

Timothy A. Davis

Hartwig Anzt

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022

Reproducability Artifact for Running SLATE's GEMM and POTRF Operations on Summit and Crusher.

[BibT_eX]

[DOI]

Dataset, August, 2022

Reproducability Artifact for Running SLATE's GEMM and POTRF Operations on Summit and Crusher.

[BibT_eX]

[DOI]

Dataset, August, 2022

Software for "Threshold Pivoting for dense LU Factorization".

[BibT_eX]

[DOI]

Dataset, May, 2022

Threshold Pivoting for Dense LU Factorization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, 2022

Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Performance, 2022

Proposed Consistent Exception Handling for the BLAS and LAPACK.

[BibT_eX]

[DOI]

Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022

2021

A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2021

Translational process: Mathematical software perspective.

[BibT_eX]

[DOI]

J. Comput. Sci., 2021

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2021

Task-graph scheduling extensions for efficient synchronization and communication.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2020

MAGMA templates for scalable linear algebra on emerging architectures.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2020

A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic.

[BibT_eX]

[DOI]

CoRR, 2020

2019

PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2019

SLATE: design of a modern distributed and accelerated linear algebra library.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

Least squares solvers for distributed-memory machines with GPU accelerators.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing, 2019

Massively Parallel Automated Software Tuning.

[BibT_eX]

[DOI]

Proceedings of the 48th International Conference on Parallel Processing, 2019

Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2019: Parallel Processing, 2019

2018

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale.

[BibT_eX]

[DOI]

SIAM Rev., 2018

Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators.

[BibT_eX]

[DOI]

Proc. IEEE, 2018

Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs.

[BibT_eX]

[DOI]

Stanimire Tomov

Parallel Comput., 2018

2017

Preconditioned Krylov solvers on GPUs.

[BibT_eX]

[DOI]

Parallel Comput., 2017

With Extreme Computing, the Rules Have Changed.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2017

Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Bringing High Performance Computing to Big Data Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Big Data Technologies, 2017

2016

Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Linear algebra software for large-scale accelerated multicore computing.

[BibT_eX]

[DOI]

Acta Numer., 2016

Heterogeneous Streaming.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Search Space Generation and Pruning System for Autotuners.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015

Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2015

HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.

[BibT_eX]

[DOI]

Sci. Program., 2015

High-performance hybrid CPU and GPU parallel algorithm for digital volume correlation.

[BibT_eX]

[DOI]

Michael T. Heath

John Lambros

Int. J. High Perform. Comput. Appl., 2015

A survey of recent developments in parallel implementations of Gaussian elimination.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2015

Accelerating collaborative filtering using concepts from high performance computing.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014

Accelerating Computation of Eigenvectors in the Dense Nonsymmetric Eigenvalue Problem.

[BibT_eX]

[DOI]

Azzam Haidar

Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014

clMAGMA: high performance dense linear algebra with OpenCL.

[BibT_eX]

[DOI]

Proceedings of the International Workshop on OpenCL, 2014

Accelerating Numerical Dense Linear Algebra Calculations with GPUs.

[BibT_eX]

[DOI]

Proceedings of the Numerical Computations with GPUs, 2014

2013

Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2013

Virtual Systolic Array for QR Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

2012

Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2012

2011

High performance digital volume correlation

[BibT_eX]

[DOI]