Piotr Luszczek

According to our database1, Piotr Luszczek
  • authored at least 109 papers between 1998 and 2017.
  • has a "Dijkstra number"2 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2017
Porting the PLASMA Numerical Library to the OpenMP Standard.
International Journal of Parallel Programming, 2017

With Extreme Computing, the Rules Have Changed.
Computing in Science and Engineering, 2017

Interoperable Convergence of Storage, Networking and Computation.
CoRR, 2017

Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Towards numerical benchmark for half-precision floating point arithmetic.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Scaling point set registration in 3D across thread counts on multicore and hardware accelerator platforms through autotuning for large scale analysis of scientific point clouds.
Proceedings of the 2017 IEEE International Conference on Big Data, BigData 2017, 2017

Bringing High Performance Computing to Big Data Algorithms.
Proceedings of the Handbook of Big Data Technologies, 2017

2016
High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems.
IJHPCA, 2016

Task-Based Cholesky Decomposition on Knights Corner Using OpenMP.
Proceedings of the High Performance Computing, 2016

Performance-Portable Autotuning of OpenCL Kernels for Convolutional Layers of Deep Neural Networks.
Proceedings of the 2nd Workshop on Machine Learning in HPC Environments, 2016

Heterogeneous Streaming.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Search Space Generation and Pruning System for Autotuners.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015
HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.
Scientific Programming, 2015

Batched matrix computations on hardware accelerators based on GPUs.
IJHPCA, 2015

Acceleration of GPU-based Krylov solvers via data transfer reduction.
IJHPCA, 2015

A survey of recent developments in parallel implementations of Gaussian elimination.
Concurrency and Computation: Practice and Experience, 2015

Experiences in autotuning matrix multiplication for energy minimization on GPUs.
Concurrency and Computation: Practice and Experience, 2015

A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations.
Proceedings of the High Performance Computing - 30th International Conference, 2015

Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster.
Proceedings of the International Conference for High Performance Computing, 2015

Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs.
Proceedings of the International Conference for High Performance Computing, 2015

Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators.
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015

Optimization for performance and energy for batched matrix computations on GPUs.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Towards batched linear solvers on accelerated hardware platforms.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

MAGMA embedded: Towards a dense linear algebra library for energy efficient extreme computing.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015

Flexible Linear Algebra Development and Scheduling with Cholesky Factorization.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

2014
Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime.
Parallel Processing Letters, 2014

Looking back at dense linear algebra software.
J. Parallel Distrib. Comput., 2014

Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting.
Concurrency and Computation: Practice and Experience, 2014

BlackjackBench: Portable Hardware Characterization with Automated Results' Analysis.
Comput. J., 2014

Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors.
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014

clMAGMA: high performance dense linear algebra with OpenCL.
Proceedings of the International Workshop on OpenCL, 2014

Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

New Algorithm for Computing Eigenvectors of the Symmetric Eigenvalue Problem.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Optimizing Krylov Subspace Solvers on Graphics Processing Units.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Parallel Simulation of Superscalar Scheduling.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

2013
LU Factorization with Partial Pivoting for a Multicore System with Accelerators.
IEEE Trans. Parallel Distrib. Syst., 2013

High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures.
ACM Trans. Math. Softw., 2013

Soft error resilient QR factorization for hybrid system with GPGPU.
J. Comput. Science, 2013

CPU-GPU hybrid bidiagonal reduction with soft error resilience.
Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013

Parallel reduction to hessenberg form with algorithm-based fault tolerance.
Proceedings of the International Conference for High Performance Computing, 2013

An improved parallel singular value algorithm and its implementation for multicore hardware.
Proceedings of the International Conference for High Performance Computing, 2013

Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.
Proceedings of the Parallel Processing and Applied Mathematics, 2013

Virtual Systolic Array for QR Decomposition.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

2012
BlackjackBench: portable hardware characterization.
SIGMETRICS Performance Evaluation Review, 2012

Multi-GPU Implementation of LU Factorization.
Proceedings of the International Conference on Computational Science, 2012

High Performance Dense Linear System Solver with Resilience to Multiple Soft Errors.
Proceedings of the International Conference on Computational Science, 2012

From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming.
Parallel Computing, 2012

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency.
Computer Science - R&D, 2012

Programming the LU Factorization for a Multicore System with Accelerators.
Proceedings of the High Performance Computing for Computational Science, 2012

A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Measuring Energy and Power with PAPI.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Anatomy of a globally recursive embedded LINPACK benchmark.
Proceedings of the IEEE Conference on High Performance Extreme Computing, 2012

Scalable Dense Linear Algebra on Heterogeneous Hardware.
Proceedings of the Transition of HPC Towards Exascale Computing, 2012

GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures.
Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012

Dense Linear Algebra on Accelerated Multicore Hardware.
Proceedings of the High-Performance Scientific Computing - Algorithms and Applications., 2012

2011
TOP500.
Proceedings of the Encyclopedia of Parallel Computing, 2011

ScaLAPACK.
Proceedings of the Encyclopedia of Parallel Computing, 2011

PLASMA.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Livermore Loops.
Proceedings of the Encyclopedia of Parallel Computing, 2011

LINPACK Benchmark.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Linear Algebra Software.
Proceedings of the Encyclopedia of Parallel Computing, 2011

LAPACK.
Proceedings of the Encyclopedia of Parallel Computing, 2011

HPC Challenge Benchmark.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Benchmarks.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Linear algebra - software issues.
Scholarpedia, 2011

Soft error resilient QR factorization for hybrid system with GPGPU.
Proceedings of the second workshop on Scalable algorithms for large-scale systems, 2011

High performance matrix inversion based on LU factorization for multicore architectures.
Proceedings of the 2011 ACM International Workshop on Many Task Computing on Grids and Supercomputers, 2011

Reducing the Time to Tune Parallel Dense Linear Algebra Routines with Partial Execution and Performance Modeling.
Proceedings of the Parallel Processing and Applied Mathematics, 2011

Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction.
Proceedings of the Parallel Processing and Applied Mathematics, 2011

Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Exploiting Fine-Grain Parallelism in Recursive LU Factorization.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Evaluation of the HPC Challenge Benchmarks in Virtualized Environments.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

High Performance Dense Linear System Solver with Soft Error Resilience.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010
Mixed-Tool Performance Analysis on Hybrid Multicore Architectures.
Proceedings of the 39th International Conference on Parallel Processing, 2010

2009
Parallel Programming in MATLAB.
IJHPCA, 2009

Accelerating scientific computations with mixed precision algorithms.
Computer Physics Communications, 2009

2008
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy.
ACM Trans. Math. Softw., 2008

The PlayStation 3 for High-Performance Scientific Computing.
Computing in Science and Engineering, 2008

Accelerating Scientific Computations with Mixed Precision Algorithms
CoRR, 2008

DARPA's HPCS Program- History, Models, Tools, Languages.
Advances in Computers, 2008

2007
Prospectus for a Dense Linear Algebra Software Library.
Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007

High Performance Development for High End Computing With Python Language Wrapper (PLW).
IJHPCA, 2007

Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems.
IJHPCA, 2007

2006
Self-adapting numerical software (SANS) effort.
IBM Journal of Research and Development, 2006

S12 - The HPC Challenge (HPCC) benchmark suite.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Tools and techniques for performance - Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems).
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Prospectus for the Next LAPACK and ScaLAPACK Libraries.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

The Impact of Multicore on Math Software.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Exploiting Mixed Precision Floating Point Hardware in Scientific Computations.
Proceedings of the High Performance Computing and Grids in Action, 2006

2004
Design of Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations.
Proceedings of the Computational Science, 2004

The LAPACK for Clusters Project: An Example of Self Adapting Numerical Software.
Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS-37 2004), 2004

2003
Self-adapting software for numerical linear algebra and LAPACK for clusters.
Parallel Computing, 2003

The LINPACK Benchmark: past, present and future.
Concurrency and Computation: Practice and Experience, 2003

Self-Adapting Software for Numerical Linear Algebra Library Routines on Clusters.
Proceedings of the Computational Science - ICCS 2003, 2003

2001
Recursive approach in sparse matrix LU factorization.
Scientific Programming, 2001

Creating Java to Native Code Interfaces with Janet.
Scientific Programming, 2001

Convenient use of legacy software in Java with Janet package.
Future Generation Comp. Syst., 2001

2000
A Versatile Support for Binding Native Code to Java.
Proceedings of the High-Performance Computing and Networking, 8th International Conference, 2000

1999
Towards Portable Runtime Support for Irregular and Out-of-Core Computations.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1999

1998
Porting CHAOS Library to MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1998


  Loading...