Grey Ballard

Laura Grigori

Mariana Martinez Aguilar

Arvind K. Saibaba

Bhisham Dev Verma

CoRR, November, 2025

Improved Analysis of Khatri-Rao Random Projections and Applications.

[BibT_eX]

[DOI]

Arvind K. Saibaba

Bhisham Dev Verma

CoRR, July, 2025

Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., June, 2025

Minimizing Communication for Parallel Symmetric Tensor Times Same Vector Computation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Randomized Algorithms for Symmetric Nonnegative Matrix Factorization.

[BibT_eX]

[DOI]

SIAM J. Matrix Anal. Appl., 2025

Brief Announcement: Minimizing Communication for Parallel Symmetric Tensor Times Same Vector Computation.

[BibT_eX]

[DOI]

Proceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures, 2025

Parallel Rank-Adaptive Higher Order Orthogonal Iteration.

[BibT_eX]

[DOI]

João Pinheiro

Aditya Devarakonda

Proceedings of the International Conference for High Performance Computing, 2025

Visualizing MPI Collective Communication.

[BibT_eX]

[DOI]

Christopher Atala

Meredith Morrison

Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025

2024

Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation.

[BibT_eX]

[DOI]

SIAM J. Matrix Anal. Appl., March, 2024

Parallel Randomized Tucker Decomposition Algorithms.

[BibT_eX]

[DOI]

Rachel Minster

Zitong Li

SIAM J. Sci. Comput., 2024

Sequential and Shared-Memory Parallel Algorithms for Partitioned Local Depths.

[BibT_eX]

[DOI]

Aditya Devarakonda

Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Computing, 2024

Visualizing PRAM Algorithm for Mergesort.

[BibT_eX]

[DOI]

Cade Wiley

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

On Rank Selection for Nonnegative Matrix Factorization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data, 2024

2023

CP decomposition for tensors via alternating least squares with QR decomposition.

[BibT_eX]

[DOI]

Numer. Linear Algebra Appl., December, 2023

AminerMag X Dataset.

[BibT_eX]

[DOI]

Dataset, June, 2023

AminerMag S Dataset.

[BibT_eX]

[DOI]

Dataset, June, 2023

Randomized Algorithms for Rounding in the Tensor-Train Format.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., February, 2023

Parallel Memory-Independent Communication Bounds for SYRK.

[BibT_eX]

[DOI]

Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023

Distributed-Memory Parallel JointNMF.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Supercomputing, 2023

2022

Parallel Algorithms for Tensor Train Arithmetic.

[BibT_eX]

[DOI]

Hussam Al Daas

Peter Benner

SIAM J. Sci. Comput., 2022

Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds.

[BibT_eX]

[DOI]

CoRR, 2022

Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds.

[BibT_eX]

[DOI]

Proceedings of the SPAA '22: 34th ACM Symposium on Parallelism in Algorithms and Architectures, Philadelphia, PA, USA, July 11, 2022

Parallel Tensor Train Rounding using Gram SVD.

[BibT_eX]

[DOI]

Hussam Al Daas

Lawton Manning

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021

PLANC: Parallel Low-rank Approximation with Nonnegativity Constraints.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2021

Accelerating Neural Network Training using Arbitrary Precision Approximating Matrix Multiplication Algorithms.

[BibT_eX]

[DOI]

Jack Weissenberger

Luoping Zhang

Proceedings of the ICPP Workshops 2021: 50th International Conference on Parallel Processing, 2021

Parallel Tucker Decomposition with Numerically Accurate SVD.

[BibT_eX]

[DOI]

Zitong Li

Qiming Fang

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Visualizing Parallel Dynamic Programming using the Thread Safe Graphics Library.

[BibT_eX]

[DOI]

Sarah Parsons

Proceedings of the 9th IEEE/ACM Workshop on Education for High Performance Computing, 2021

2020

TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition.

[BibT_eX]

[DOI]

Alicia M. Klinvex

ACM Trans. Math. Softw., 2020

Distributed-memory parallel symmetric nonnegative matrix factorization.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

General Memory-Independent Lower Bound for MTTKRP.

[BibT_eX]

[DOI]

Kathryn Rouse

Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing, 2020

Parallel Hierarchical Clustering using Rank-Two Nonnegative Matrix Factorization.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

2019

A Generalized Randomized Rank-Revealing Factorization.

[BibT_eX]

[DOI]

CoRR, 2019

PLANC: Parallel Low Rank Approximation with Non-negativity Constraints.

[BibT_eX]

[DOI]

CoRR, 2019

Joint 3D Localization and Classification of Space Debris using a Multispectral Rotating Point Spread Function.

[BibT_eX]

[DOI]

CoRR, 2019

Dynamic Functional Magnetic Resonance Imaging Connectivity Tensor Decomposition: A New Approach to Analyze and Interpret Dynamic Brain Connectivity.

[BibT_eX]

[DOI]

Brain Connect., 2019

2018

MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization.

[BibT_eX]

[DOI]

Haesun Park

IEEE Trans. Knowl. Data Eng., 2018

A Practical Randomized CP Tensor Decomposition.

[BibT_eX]

[DOI]

Casey Battaglino

SIAM J. Matrix Anal. Appl., 2018

The geometry of rank decompositions of matrix multiplication II: 3×3 matrices.

[BibT_eX]

[DOI]

CoRR, 2018

A 3D Parallel Algorithm for QR Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, 2018

Shared-memory parallelization of MTTKRP for dense tensors.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product.

[BibT_eX]

[DOI]

Nicholas Knight

Kathryn Rouse

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization.

[BibT_eX]

[DOI]

Oguz Kaya

Proceedings of the 47th International Conference on Parallel Processing, 2018

Parallel Nonnegative CP Decomposition of Dense Tensors.

[BibT_eX]

[DOI]

Koby Hayashi

Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

2017

Guest Editor Introduction PPoPP 2016, Special Issue 2 of 2.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2017

Shared Memory Parallelization of MTTKRP for Dense Tensors.

[BibT_eX]

[DOI]

CoRR, 2017

A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

2016

Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2016

Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid.

[BibT_eX]

[DOI]

Christopher M. Siefert

Jonathan J. Hu

SIAM J. Sci. Comput., 2016

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2016

Improving the Numerical Stability of Fast Matrix Multiplication.

[BibT_eX]

[DOI]

SIAM J. Matrix Anal. Appl., 2016

Network Topologies and Inevitable Contention.

[BibT_eX]

[DOI]

Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

A high-performance parallel algorithm for nonnegative matrix factorization.

[BibT_eX]

[DOI]

Haesun Park

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Parallel Tensor Compression for Large-Scale Scientific Data.

[BibT_eX]

[DOI]

Woody Austin

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015

Avoiding Communication in Successive Band Reduction.

[BibT_eX]

[DOI]

Nicholas Knight

ACM Trans. Parallel Comput., 2015

Reconstructing Householder vectors from Tall-Skinny QR.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2015

Improving the numerical stability of fast matrix multiplication algorithms.

[BibT_eX]

[DOI]

CoRR, 2015

Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, 2015

A framework for practical parallel fast matrix multiplication.

[BibT_eX]

[DOI]

Austin R. Benson

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Diamond Sampling for Approximate Maximum All-Pairs Dot-Product (MAD) Search.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Data Mining, 2015

2014

Communication-Avoiding Symmetric-Indefinite Factorization.

[BibT_eX]

[DOI]

SIAM J. Matrix Anal. Appl., 2014

Communication costs of Strassen's matrix multiplication.

[BibT_eX]

[DOI]

Commun. ACM, 2014

Communication lower bounds and optimal algorithms for numerical linear algebra.

[BibT_eX]

[DOI]

Acta Numer., 2014

Reconstructing Householder Vectors from Tall-Skinny QR.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013

Avoiding Communication in Dense Linear Algebra.

[BibT_eX]

[DOI]

PhD thesis, 2013

Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout.

[BibT_eX]

[DOI]

Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013

Communication optimal parallel multiplication of sparse random matrices.

[BibT_eX]

[DOI]

Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013

Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012

Graph expansion and communication costs of fast matrix multiplication.

[BibT_eX]

[DOI]

J. ACM, 2012

Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds

[BibT_eX]

[DOI]

CoRR, 2012

Communication-optimal parallel algorithm for strassen's matrix multiplication.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Communication-avoiding parallel strassen: implementation and performance.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Communication avoiding successive band reduction.

[BibT_eX]

[DOI]

Nicholas Knight

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication.

[BibT_eX]

[DOI]

Proceedings of the Design and Analysis of Algorithms, 2012

2011

Minimizing Communication in Numerical Linear Algebra.

[BibT_eX]

[DOI]

SIAM J. Matrix Anal. Appl., 2011

Graph expansion and communication costs of fast matrix multiplication: regular submission.

[BibT_eX]

[DOI]

Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011

Brief announcement: communication bounds for heterogeneous architectures.

[BibT_eX]

[DOI]

Andrew Gearhart

Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011

Efficiently Computing Tensor Eigenvalues on a GPU.

[BibT_eX]

[DOI]

Todd D. Plantenga

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Communication-Avoiding QR Decomposition for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010

Communication-optimal Parallel and Sequential Cholesky Decomposition.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2010

Minimizing Communication for Eigenproblems and the Singular Value Decomposition

[BibT_eX]

[DOI]

Ioana Dumitriu

CoRR, 2010

2009

Minimizing Communication in Linear Algebra

[BibT_eX]

[DOI]

CoRR, 2009

Communication-optimal parallel and sequential Cholesky decomposition: extended abstract.

[BibT_eX]

[DOI]

Proceedings of the SPAA 2009: Proceedings of the 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2009

2006

Modeling protein dependency networks using CoCoA.

[BibT_eX]

[DOI]