Paolo Bientinesi

Orcid: 0000-0002-4972-7097

Affiliations:
  • Umea University, Sweden
  • RWTH Aachen University, Germany (former)


According to our database1, Paolo Bientinesi authored at least 99 papers between 2004 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
The Essential Algorithms for the Matrix Chain.
CoRR, 2023

2022
Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration.
IEEE Trans. Parallel Distributed Syst., 2022

Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions.
ACM Trans. Math. Softw., 2022

The Linear Algebra Mapping Problem. Current State of Linear Algebra Languages and Libraries.
ACM Trans. Math. Softw., 2022

Accelerating Jackknife Resampling for the Canonical Polyadic Decomposition.
Frontiers Appl. Math. Stat., 2022

Editorial: High-performance tensor computations in scientific computing and data science.
Frontiers Appl. Math. Stat., 2022

Tensor Computations: Applications and Optimization (Dagstuhl Seminar 22101).
Dagstuhl Reports, 2022

MOM: Matrix Operations in MLIR.
CoRR, 2022

Automatic Detection of Cue Points for the Emulation of DJ Mixing.
Comput. Music. J., 2022

A Test for FLOPs as a Discriminant for Linear Algebra Algorithms.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

FLOPs as a Discriminant for Dense Linear Algebra Algorithms.
Proceedings of the 51st International Conference on Parallel Processing, 2022

2021
Linnea: Automatic Generation of Efficient Linear Algebra Programs.
ACM Trans. Math. Softw., 2021

Rational Spectral Filters with Optimal Convergence Rate.
SIAM J. Sci. Comput., 2021

The landscape of software for tensor computations.
CoRR, 2021

ADTOF: A large dataset of non-synthetic music for automatic drum transcription.
Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021

Performance Comparison for Scientific Computations on the Edge via Relative Performance.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020
Tensor Computations: Applications and Optimization (Dagstuhl Seminar 20111).
Dagstuhl Reports, 2020

Robust Ranking of Linear Algebra Algorithms via Relative Performance.
CoRR, 2020

Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions.
CoRR, 2020

Automatic Detection of Cue Points for DJ Mixing.
CoRR, 2020

Accelerating Deep Learning Inference in Constrained Embedded Devices Using Hardware Loops and a Dot Product Unit.
IEEE Access, 2020

Automatic Generation of Efficient Linear Algebra Programs.
Proceedings of the PASC '20: Platform for Advanced Scientific Computing Conference, Geneva, Switzerland, June 29, 2020

2019
Spin Summations: A High-Performance Perspective.
ACM Trans. Math. Softw., 2019

Accelerating AIREBO: Navigating the Journey from Legacy to High-Performance Code.
J. Comput. Chem., 2019

The ELAPS framework: Experimental Linear Algebra Performance Studies.
Int. J. High Perform. Comput. Appl., 2019

The Linear Algebra Mapping Problem.
CoRR, 2019

Program Generation for Linear Algebra Using Multiple Layers of DSLs.
CoRR, 2019

2018
Design of a High-Performance GEMM-like Tensor-Tensor Multiplication.
ACM Trans. Math. Softw., 2018

Accelerating molecular dynamics codes by performance and accuracy modeling.
J. Comput. Sci., 2018

Optimizing AIREBO: Navigating the Journey from Complex Legacy Code to High Performance.
CoRR, 2018

A Timer-Augmented Cost Function for Load Balanced DSMC.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2018, 2018

Extended Pipeline for Content-Based Feature Engineering in Music Genre Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Program generation for small-scale linear algebra applications.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

The generalized matrix chain algorithm.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

2017
TTC: A High-Performance Compiler for Tensor Transpositions.
ACM Trans. Math. Softw., 2017

Algorithm 979: Recursive Algorithms for Dense Linear Algebra - The ReLAPACK Collection.
ACM Trans. Math. Softw., 2017

High-performance generation of the Hamiltonian and Overlap matrices in FLAPW methods.
Comput. Phys. Commun., 2017

Assessment of sound spatialisation algorithms for sonic rendering with headsets.
CoRR, 2017

The Tersoff many-body potential: Sustainable performance through vectorization.
CoRR, 2017

LAMMPS' PPPM Long-Range Solver for the Second Generation Xeon Phi.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

MatchPy: A Pattern Matching Library.
Proceedings of the 16th Python in Science Conference 2017, 2017

Efficient Pattern Matching in Python.
Proceedings of the 7th Workshop on Python for High-Performance and Scientific Computing, 2017

HPTT: a high-performance tensor transposition C++ library.
Proceedings of the 4th ACM SIGPLAN International Workshop on Libraries, 2017

Linnea: Compiling Linear Algebra Expressions to High-Performance Code.
Proceedings of the International Workshop on Parallel Symbolic Computation, 2017

2016
A Note on Time Measurements in LAMMPS.
CoRR, 2016

Recursive Algorithms for Dense Linear Algebra: The ReLAPACK Collection.
CoRR, 2016

Accelerating scientific codes by performance and accuracy modeling.
CoRR, 2016

Large Scale Parallel Computations in R through Elemental.
CoRR, 2016

The Matrix Chain Algorithm to Compile Linear Algebra Expressions.
CoRR, 2016

Large-scale linear regression: Development of high-performance routines.
Appl. Math. Comput., 2016

The vectorization of the tersoff multi-body potential: an exercise in performance portability.
Proceedings of the International Conference for High Performance Computing, 2016

TTC: a tensor transposition compiler for multiple architectures.
Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, 2016

2015
High performance solutions for big-data GWAS.
Parallel Comput., 2015

Parallel computing on graphics processing units and heterogeneous platforms.
Concurr. Comput. Pract. Exp., 2015

A Scalable, Linear-Time Dynamic Cutoff Algorithm for Molecular Dynamics.
Proceedings of the High Performance Computing - 30th International Conference, 2015

Scalable and Efficient Linear Algebra Kernel Mapping for Low Energy Consumption on the Layers CGRA.
Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

2014
Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies.
ACM Trans. Math. Softw., 2014

Improved Accuracy and Parallelism for MRRR-Based Eigensolvers - A Mixed Precision Approach.
SIAM J. Sci. Comput., 2014

Cache-aware Performance Modeling and Prediction for Dense Linear Algebra.
CoRR, 2014

Towards an efficient use of the BLAS library for multilinear tensor contractions.
Appl. Math. Comput., 2014

Solving sequences of generalized least-squares problems on multi-threaded architectures.
Appl. Math. Comput., 2014

A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

On the Performance Prediction of BLAS-based Tensor Contractions.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

2013
High-Performance Solvers for Dense Hermitian Eigenproblems.
SIAM J. Sci. Comput., 2013

Dissecting the FEAST algorithm for generalized eigenproblems.
J. Comput. Appl. Math., 2013

Application-tailored linear algebra algorithms: A search-based approach.
Int. J. High Perform. Comput. Appl., 2013

Deriving dense linear algebra libraries.
Formal Aspects Comput., 2013

Streaming Data from HDD to GPUs for Sustained Peak Performance
CoRR, 2013

Algorithms for large-scale whole genome association analysis.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

GWAS on GPUs: Streaming Data from HDD for Sustained Performance.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012
Modeling performance through memory-stalls.
SIGMETRICS Perform. Evaluation Rev., 2012

Correlations in sequences of generalized eigenproblems arising in Density Functional Theory.
Comput. Phys. Commun., 2012

High-throughput Genome-wide Association Analysis for Single and Multiple Phenotypes
CoRR, 2012

Solving dense generalized eigenproblems on multi-threaded architectures.
Appl. Math. Comput., 2012

A Domain-Specific Compiler for Linear Algebra Operations.
Proceedings of the High Performance Computing for Computational Science, 2012

Performance Modeling for Dense Linear Algebra.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

2011
Goal-Oriented and Modular Stability Analysis.
SIAM J. Matrix Anal. Appl., 2011

MR<sup>3</sup>-SMP: A symmetric tridiagonal eigensolver for multi-core architectures.
Parallel Comput., 2011

Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures.
Concurr. Comput. Pract. Exp., 2011

Improving high-performance computations on clouds through resource underutilization.
Proceedings of the 2011 ACM Symposium on Applied Computing (SAC), TaiChung, Taiwan, March 21, 2011

Automatic Generation of Loop-Invariants for Matrix Operations.
Proceedings of the International Conference on Computational Science and Its Applications, 2011

Knowledge-Based Automatic Generation of Partitioned Matrix Expressions.
Proceedings of the Computer Algebra in Scientific Computing - 13th International Workshop, 2011

2010
Towards mechanical derivation of Krylov solver libraries.
Proceedings of the International Conference on Computational Science, 2010

Matrix Structure Exploitation in Generalized Eigenproblems Arising in Density Functional Theory
CoRR, 2010

The Algorithm of Multiple Relatively Robust Representations for Multi-core Processors.
Proceedings of the Applied Parallel and Scientific Computing, 2010

High-Performance Parallel Computations Using Python as High-Level Language.
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

HPC on Competitive Cloud Resources.
Proceedings of the Handbook of Cloud Computing., 2010

2009
An Example of Symmetry Exploitation for Energy-related Eigencomputations
CoRR, 2009

On Parallelizing the MRRR Algorithm for Data-Parallel Coprocessors.
Proceedings of the Parallel Processing and Applied Mathematics, 2009

Reduction to Condensed Forms for Symmetric Eigenvalue Problems on Multi-core Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2009

2008
Scalable parallelization of FLAME code via the workqueuing model.
ACM Trans. Math. Softw., 2008

Families of algorithms related to the inversion of a Symmetric Positive Definite matrix.
ACM Trans. Math. Softw., 2008

SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

2005
Representing linear algebra algorithms in code: the FLAME application program interfaces.
ACM Trans. Math. Softw., 2005

The science of deriving dense linear algebra algorithms.
ACM Trans. Math. Softw., 2005

A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations.
SIAM J. Sci. Comput., 2005

2004
Automatic Derivation of Linear Algebra Algorithms with Application to Control Theory.
Proceedings of the Applied Parallel Computing, 2004

Rapid Development of High-Performance Linear Algebra Libraries.
Proceedings of the Applied Parallel Computing, 2004


  Loading...