Hatem Ltaief

According to our database1, Hatem Ltaief
  • authored at least 56 papers between 2006 and 2017.
  • has a "Dijkstra number"2 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2017
ExaGeoStat: A High Performance Unified Framework for Geostatistics on Manycore Systems.
CoRR, 2017

Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression.
CoRR, 2017

A framework for dense triangular matrix kernels on various manycore architectures.
Concurrency and Computation: Practice and Experience, 2017

Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

2016
A High Performance QDWH-SVD Solver Using Hardware Accelerators.
ACM Trans. Math. Softw., 2016

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators.
ACM Trans. Math. Softw., 2016

Accelerated Dimension-Independent Adaptive Metropolis.
SIAM J. Scientific Computing, 2016

Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs.
Concurrency and Computation: Practice and Experience, 2016

Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2016

Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Efficient Sphere Detector Algorithm for Massive MIMO using GPU Hardware Accelerator.
Proceedings of the International Conference on Computational Science 2016, 2016

High Performance Polar Decomposition on Distributed Memory Systems.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

Redesigning Triangular Dense Matrix Computations on GPUs.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015
Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates.
SIAM J. Scientific Computing, 2015

Multi-dimensional intra-tile parallelization for memory-starved stencil computations.
CoRR, 2015

Optimization of an electromagnetics code with multicore wavefront diamond blocking and multi-dimensional intra-tile parallelization.
CoRR, 2015

High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

2014
Power profiling of Cholesky and QR factorizations on distributed memory systems.
Computer Science - R&D, 2014

Multicore-optimized wavefront diamond blocking for optimizing stencil updates.
CoRR, 2014

Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking.
CoRR, 2014

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators.
CoRR, 2014

Data-driven execution of fast multipole methods.
Concurrency and Computation: Practice and Experience, 2014

Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting.
Concurrency and Computation: Practice and Experience, 2014

Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System.
Proceedings of the International Conference for High Performance Computing, 2014

High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

2013
High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures.
ACM Trans. Math. Softw., 2013

2012
Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem.
SIAM J. Scientific Computing, 2012

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency.
Computer Science - R&D, 2012

Data-Driven Execution of Fast Multipole Methods
CoRR, 2012

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators.
Proceedings of the High Performance Computing for Computational Science, 2012



A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures.
Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012

2011
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures.
Concurrency and Computation: Practice and Experience, 2011

Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels.
Proceedings of the Conference on High Performance Computing Networking, 2011

Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction.
Proceedings of the Parallel Processing and Applied Mathematics, 2011

Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Exploiting Fine-Grain Parallelism in Recursive LU Factorization.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

LU factorization for accelerator-based systems.
Proceedings of the 9th IEEE/ACS International Conference on Computer Systems and Applications, 2011

2010
Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures.
IEEE Trans. Parallel Distrib. Syst., 2010

Scheduling two-sided transformations using tile algorithms on multicore architectures.
Scientific Programming, 2010

Scheduling dense linear algebra operations on multicore processors.
Concurrency and Computation: Practice and Experience, 2010

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems.
Proceedings of the Conference on High Performance Computing Networking, 2010

Dense linear algebra solvers for multicore with GPU accelerators.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Tile QR factorization with parallel panel processing for multicore architectures.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

2009
A parallel Aitken-additive Schwarz waveform relaxation suitable for the grid.
Parallel Computing, 2009

Comparative study of one-sided factorizations with multiple software packages on multi-core hardware.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

2008
Fault tolerant algorithms for heat transfer problems.
J. Parallel Distrib. Comput., 2008

Scheduling for Numerical Linear Algebra Library at Scale.
Proceedings of the High Speed and Large Scale Scientific Computing - Selected Papers from the High Performance Computing Workshop, Cetraro, Italy, June 30, 2008

2006
Parallel Fault Tolerant Algorithms for Parabolic Problems.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006


  Loading...