Paolo Bientinesi

André Severo Pereira Gomes

CoRR, May, 2026

Report on the second Toulouse Tensor Workshop.

[BibT_eX]

[DOI]

Jan Brandejs

Trond Saue

Lucas Visscher

CoRR, February, 2026

Tensor Algebra Processing Primitives (TAPP): Towards a Standard for Tensor Operations.

[BibT_eX]

[DOI]

CoRR, January, 2026

Enabling mixed-precision in spectral element codes.

[BibT_eX]

[DOI]

Yanxiang Chen

Pablo de Oliveira Castro

Niclas Jansson

Future Gener. Comput. Syst., 2026

The software landscape for the density matrix renormalization group.

[BibT_eX]

[DOI]

Comput. Phys. Commun., 2026

Compilation of Generalized Matrix Chains with Symbolic Sizes.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026

2025

Ranking with ties based on noisy performance data.

[BibT_eX]

[DOI]

Int. J. Data Sci. Anal., October, 2025

On the parenthesisations of matrix chains: All are useful, few are essential.

[BibT_eX]

[DOI]

J. Comb. Optim., April, 2025

Enabling Mixed-Precision in Computational Fluids Dynamics Codes.

[BibT_eX]

[DOI]

Yanxiang Chen

Pablo de Oliveira Castro

Niclas Jansson

CoRR, March, 2025

2024

Analyzing and reducing the synthetic-to-real transfer gap in Music Information Retrieval: the task of automatic drum transcription.

[BibT_eX]

[DOI]

CoRR, 2024

Inspection of I/O Operations from System Call Traces using Directly-Follows-Graph.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Enabling Mixed-Precision with the Help of Tools: A Nekbone Case Study.

[BibT_eX]

[DOI]

Yanxiang Chen

Pablo de Oliveira Castro

Proceedings of the Parallel Processing and Applied Mathematics, 2024

In-Depth Performance Analysis of the ADTOF-Based Algorithm for Automatic Drum Transcription.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

2023

The Essential Algorithms for the Matrix Chain.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2022

The Linear Algebra Mapping Problem. Current State of Linear Algebra Languages and Libraries.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2022

Accelerating Jackknife Resampling for the Canonical Polyadic Decomposition.

[BibT_eX]

[DOI]

Frontiers Appl. Math. Stat., 2022

Editorial: High-performance tensor computations in scientific computing and data science.

[BibT_eX]

[DOI]

Frontiers Appl. Math. Stat., 2022

Tensor Computations: Applications and Optimization (Dagstuhl Seminar 22101).

[BibT_eX]

[DOI]

Dagstuhl Reports, 2022

MOM: Matrix Operations in MLIR.

[BibT_eX]

[DOI]

Daniele G. Spampinato

CoRR, 2022

Automatic Detection of Cue Points for the Emulation of DJ Mixing.

[BibT_eX]

[DOI]

Comput. Music. J., 2022

A Test for FLOPs as a Discriminant for Linear Algebra Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

FLOPs as a Discriminant for Dense Linear Algebra Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 51st International Conference on Parallel Processing, 2022

2021

Linnea: Automatic Generation of Efficient Linear Algebra Programs.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2021

Rational Spectral Filters with Optimal Convergence Rate.

[BibT_eX]

[DOI]

Konrad Kollnig

SIAM J. Sci. Comput., 2021

The landscape of software for tensor computations.

[BibT_eX]

[DOI]

CoRR, 2021

ADTOF: A large dataset of non-synthetic music for automatic drum transcription.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021

Performance Comparison for Scientific Computations on the Edge via Relative Performance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020

Tensor Computations: Applications and Optimization (Dagstuhl Seminar 20111).

[BibT_eX]

[DOI]

Dagstuhl Reports, 2020

Robust Ranking of Linear Algebra Algorithms via Relative Performance.

[BibT_eX]

[DOI]

CoRR, 2020

Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions.

[BibT_eX]

[DOI]

CoRR, 2020

Automatic Detection of Cue Points for DJ Mixing.

[BibT_eX]

[DOI]

CoRR, 2020

Accelerating Deep Learning Inference in Constrained Embedded Devices Using Hardware Loops and a Dot Product Unit.

[BibT_eX]

[DOI]

IEEE Access, 2020

Automatic Generation of Efficient Linear Algebra Programs.

[BibT_eX]

[DOI]

Proceedings of the PASC '20: Platform for Advanced Scientific Computing Conference, Geneva, Switzerland, June 29, 2020

2019

Spin Summations: A High-Performance Perspective.

[BibT_eX]

[DOI]

Devin Matthews

ACM Trans. Math. Softw., 2019

Accelerating AIREBO: Navigating the Journey from Legacy to High-Performance Code.

[BibT_eX]

[DOI]

J. Comput. Chem., 2019

The ELAPS framework: Experimental Linear Algebra Performance Studies.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2019

The Linear Algebra Mapping Problem.

[BibT_eX]

[DOI]

CoRR, 2019

Program Generation for Linear Algebra Using Multiple Layers of DSLs.

[BibT_eX]

[DOI]

Daniele G. Spampinato

Markus Püschel

CoRR, 2019

2018

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2018

Accelerating molecular dynamics codes by performance and accuracy modeling.

[BibT_eX]

[DOI]

J. Comput. Sci., 2018

Optimizing AIREBO: Navigating the Journey from Complex Legacy Code to High Performance.

[BibT_eX]

[DOI]

CoRR, 2018

A Timer-Augmented Cost Function for Load Balanced DSMC.

[BibT_eX]

[DOI]

William McDoniel

Proceedings of the High Performance Computing for Computational Science - VECPAR 2018, 2018

Extended Pipeline for Content-Based Feature Engineering in Music Genre Recognition.

[BibT_eX]

[DOI]

Tina Raissi

Alessandro Tibo

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Program generation for small-scale linear algebra applications.

[BibT_eX]

[DOI]

Daniele G. Spampinato

Markus Püschel

Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

The generalized matrix chain algorithm.

[BibT_eX]

[DOI]

Marcin Copik

Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

2017

TTC: A High-Performance Compiler for Tensor Transpositions.

[BibT_eX]

[DOI]

Jeff R. Hammond

ACM Trans. Math. Softw., 2017

Algorithm 979: Recursive Algorithms for Dense Linear Algebra - The ReLAPACK Collection.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2017

High-performance generation of the Hamiltonian and Overlap matrices in FLAPW methods.

[BibT_eX]

[DOI]

Comput. Phys. Commun., 2017

Assessment of sound spatialisation algorithms for sonic rendering with headsets.

[BibT_eX]

[DOI]

Ali Tarzan

CoRR, 2017

The Tersoff many-body potential: Sustainable performance through vectorization.

[BibT_eX]

[DOI]

CoRR, 2017

LAMMPS' PPPM Long-Range Solver for the Second Generation Xeon Phi.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 32nd International Conference, 2017

MatchPy: A Pattern Matching Library.

[BibT_eX]

[DOI]

Manuel Krebber

Proceedings of the 16th Python in Science Conference, 2017

Efficient Pattern Matching in Python.

[BibT_eX]

[DOI]

Manuel Krebber

Proceedings of the 7th Workshop on Python for High-Performance and Scientific Computing, 2017

HPTT: a high-performance tensor transposition C++ library.

[BibT_eX]

[DOI]

Tong Su

Proceedings of the 4th ACM SIGPLAN International Workshop on Libraries, 2017

Linnea: Compiling Linear Algebra Expressions to High-Performance Code.

[BibT_eX]

[DOI]

Proceedings of the International Workshop on Parallel Symbolic Computation, 2017

2016

A Note on Time Measurements in LAMMPS.

[BibT_eX]

[DOI]

Daniel Tameling

CoRR, 2016

Recursive Algorithms for Dense Linear Algebra: The ReLAPACK Collection.

[BibT_eX]

[DOI]

CoRR, 2016

Accelerating scientific codes by performance and accuracy modeling.

[BibT_eX]

[DOI]

CoRR, 2016

Large Scale Parallel Computations in R through Elemental.

[BibT_eX]

[DOI]

Rodrigo Canales

CoRR, 2016

The Matrix Chain Algorithm to Compile Linear Algebra Expressions.

[BibT_eX]

[DOI]

CoRR, 2016

Large-scale linear regression: Development of high-performance routines.

[BibT_eX]

[DOI]

Alvaro Frank

Appl. Math. Comput., 2016

The vectorization of the tersoff multi-body potential: an exercise in performance portability.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

TTC: a tensor transposition compiler for multiple architectures.

[BibT_eX]

[DOI]

Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, 2016

2015

High performance solutions for big-data GWAS.

[BibT_eX]

[DOI]

Parallel Comput., 2015

Parallel computing on graphics processing units and heterogeneous platforms.

[BibT_eX]

[DOI]

José R. Herrero

Robert Strzodka

Concurr. Comput. Pract. Exp., 2015

A Scalable, Linear-Time Dynamic Cutoff Algorithm for Molecular Dynamics.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 30th International Conference, 2015

Scalable and Efficient Linear Algebra Kernel Mapping for Low Energy Consumption on the Layers CGRA.

[BibT_eX]

[DOI]

Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

2014

Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2014

Improved Accuracy and Parallelism for MRRR-Based Eigensolvers - A Mixed Precision Approach.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2014

Cache-aware Performance Modeling and Prediction for Dense Linear Algebra.

[BibT_eX]

[DOI]

CoRR, 2014

Towards an efficient use of the BLAS library for multilinear tensor contractions.

[BibT_eX]

[DOI]

Gregorio Quintana-Ortí

Appl. Math. Comput., 2014

Solving sequences of generalized least-squares problems on multi-threaded architectures.

[BibT_eX]

[DOI]

Yurii S. Aulchenko

Appl. Math. Comput., 2014

A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

On the Performance Prediction of BLAS-based Tensor Contractions.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

2013

High-Performance Solvers for Dense Hermitian Eigenproblems.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2013

Dissecting the FEAST algorithm for generalized eigenproblems.

[BibT_eX]

[DOI]

J. Comput. Appl. Math., 2013

Application-tailored linear algebra algorithms: A search-based approach.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2013

Deriving dense linear algebra libraries.

[BibT_eX]

[DOI]

John A. Gunnels

Margaret E. Myers

Tyler Rhodes

Field G. Van Zee

Formal Aspects Comput., 2013

Streaming Data from HDD to GPUs for Sustained Peak Performance

[BibT_eX]

[DOI]

Lucas Beyer

CoRR, 2013

Algorithms for large-scale whole genome association analysis.

[BibT_eX]

[DOI]

Yurii S. Aulchenko

Proceedings of the 20th European MPI Users's Group Meeting, 2013

GWAS on GPUs: Streaming Data from HDD for Sustained Performance.

[BibT_eX]

[DOI]

Lucas Beyer

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012

Modeling performance through memory-stalls.

[BibT_eX]

[DOI]

SIGMETRICS Perform. Evaluation Rev., 2012

Correlations in sequences of generalized eigenproblems arising in Density Functional Theory.

[BibT_eX]

[DOI]

Stefan Blügel

Comput. Phys. Commun., 2012

High-throughput Genome-wide Association Analysis for Single and Multiple Phenotypes

[BibT_eX]

[DOI]

Yurii S. Aulchenko

CoRR, 2012

Solving dense generalized eigenproblems on multi-threaded architectures.

[BibT_eX]

[DOI]

Appl. Math. Comput., 2012

A Domain-Specific Compiler for Linear Algebra Operations.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science, 2012

Performance Modeling for Dense Linear Algebra.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

2011

Goal-Oriented and Modular Stability Analysis.

[BibT_eX]

[DOI]

SIAM J. Matrix Anal. Appl., 2011

MR<sup>3</sup>-SMP: A symmetric tridiagonal eigensolver for multi-core architectures.

[BibT_eX]

[DOI]

Parallel Comput., 2011

Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2011

Improving high-performance computations on clouds through resource underutilization.

[BibT_eX]

[DOI]

Jeff Napper

Proceedings of the 2011 ACM Symposium on Applied Computing (SAC), TaiChung, Taiwan, March 21, 2011

Automatic Generation of Loop-Invariants for Matrix Operations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science and Its Applications, 2011

Knowledge-Based Automatic Generation of Partitioned Matrix Expressions.

[BibT_eX]

[DOI]

Proceedings of the Computer Algebra in Scientific Computing - 13th International Workshop, 2011

2010

Towards mechanical derivation of Krylov solver libraries.

[BibT_eX]

[DOI]

Victor Eijkhout

Proceedings of the International Conference on Computational Science, 2010

Matrix Structure Exploitation in Generalized Eigenproblems Arising in Density Functional Theory

[BibT_eX]

[DOI]

CoRR, 2010

The Algorithm of Multiple Relatively Robust Representations for Multi-core Processors.

[BibT_eX]

[DOI]

Proceedings of the Applied Parallel and Scientific Computing, 2010

High-Performance Parallel Computations Using Python as High-Level Language.

[BibT_eX]

[DOI]

Stefano Masini

Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

HPC on Competitive Cloud Resources.

[BibT_eX]

[DOI]

Jeff Napper

Proceedings of the Handbook of Cloud Computing., 2010

2009

An Example of Symmetry Exploitation for Energy-related Eigencomputations

[BibT_eX]

[DOI]

CoRR, 2009

On Parallelizing the MRRR Algorithm for Data-Parallel Coprocessors.

[BibT_eX]

[DOI]

Christian Lessig

Proceedings of the Parallel Processing and Applied Mathematics, 2009

Reduction to Condensed Forms for Symmetric Eigenvalue Problems on Multi-core Architectures.

[BibT_eX]

[DOI]

Francisco D. Igual

Daniel Kressner

Proceedings of the Parallel Processing and Applied Mathematics, 2009

2008

Scalable parallelization of FLAME code via the workqueuing model.

[BibT_eX]

[DOI]

Field G. Van Zee

Tze Meng Low

ACM Trans. Math. Softw., 2008

Families of algorithms related to the inversion of a Symmetric Positive Definite matrix.

[BibT_eX]

[DOI]

Brian C. Gunter

ACM Trans. Math. Softw., 2008

SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks.

[BibT_eX]

[DOI]

Ernie Chan

Field G. Van Zee

Gregorio Quintana-Ortí

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

2005

Representing linear algebra algorithms in code: the FLAME application program interfaces.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2005

The science of deriving dense linear algebra algorithms.

[BibT_eX]

[DOI]

John A. Gunnels

Margaret E. Myers

ACM Trans. Math. Softw., 2005

A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations.

[BibT_eX]

[DOI]

Inderjit S. Dhillon

SIAM J. Sci. Comput., 2005

2004

Automatic Derivation of Linear Algebra Algorithms with Application to Control Theory.

[BibT_eX]

[DOI]

Sergey Kolos

Proceedings of the Applied Parallel Computing, 2004

Rapid Development of High-Performance Linear Algebra Libraries.

[BibT_eX]

[DOI]