Edgar Solomonik

SIAM J. Matrix Anal. Appl., 2022

Accelerating alternating least squares for tensor decomposition by pairwise perturbation.

[BibT_eX]

[DOI]

Numer. Linear Algebra Appl., 2022

Distributed-memory tensor completion for generalized loss functions in python using new sparse tensor kernels.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2022

High-Dimensional Performance Modeling via Tensor Completion.

[BibT_eX]

[DOI]

Edward Hutter

CoRR, 2022

Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition.

[BibT_eX]

[DOI]

Navjot Singh

CoRR, 2022

Parallel Minimum Spanning Forest Computation using Sparse Matrix Kernels.

[BibT_eX]

[DOI]

Tim Baer

Raghavendra Kanakagiri

Proceedings of the 2022 SIAM Conference on Parallel Processing for Scientific Computing, 2022

ATD: Augmenting CP Tensor Decomposition by Self Supervision.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Cost-efficient Gaussian tensor network embeddings for tensor-structured inputs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Communication Lower Bounds of Bilinear Algorithms for Symmetric Tensor Contractions.

[BibT_eX]

[DOI]

Torsten Hoefler

SIAM J. Sci. Comput., 2021

Comparison of Accuracy and Scalability of Gauss-Newton and Alternating Least Squares for CANDECOMC/PARAFAC Decomposition.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2021

Communication Lower Bounds for Nested Bilinear Algorithms.

[BibT_eX]

[DOI]

Caleb Ju

Yifan Zhang

CoRR, 2021

Augmented Tensor Decomposition with Stochastic Optimization.

[BibT_eX]

[DOI]

CoRR, 2021

Efficient Preconditioners for Interior Point Methods via a new Schur Complementation Strategy.

[BibT_eX]

[DOI]

Samah Karim

CoRR, 2021

Fast Bilinear Algorithms for Symmetric Tensor Contractions.

[BibT_eX]

[DOI]

Comput. Methods Appl. Math., 2021

Fast and accurate randomized algorithms for low-rank tensor decompositions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

MTC: Multiresolution Tensor Completion from Partial and Coarse Observations.

[BibT_eX]

[DOI]

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths.

[BibT_eX]

[DOI]

Edward Hutter

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020

Derivation and Analysis of Fast Bilinear Algorithms for Convolution.

[BibT_eX]

[DOI]

Caleb Ju

SIAM Rev., 2020

On Stability of Tensor Networks and Canonical Forms.

[BibT_eX]

[DOI]

Yifan Zhang

CoRR, 2020

Efficient 2D tensor network simulation of quantum systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Distributed-memory DMRG via sparse and dense parallel tensor contractions.

[BibT_eX]

[DOI]

Ryan Levy

Bryan K. Clark

Proceedings of the International Conference for High Performance Computing, 2020

Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons.

[BibT_eX]

[DOI]

Maciej Besta

Raghavendra Kanakagiri

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

AutoHOOT: Automatic High-Order Optimization for Tensors.

[BibT_eX]

[DOI]

Jiayu Ye

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Comparison of Accuracy and Scalability of Gauss-Newton and Alternating Least Squares for CP Decomposition.

[BibT_eX]

[DOI]

CoRR, 2019

Enabling Distributed-Memory Tensor Completion in Python using New Sparse Tensor Kernels.

[BibT_eX]

[DOI]

CoRR, 2019

Histogram Sort with Sampling.

[BibT_eX]

[DOI]

Vipul Harsh

Laxmikant V. Kalé

Proceedings of the 31st ACM on Symposium on Parallelism in Algorithms and Architectures, 2019

ExTensor: An Accelerator for Sparse Tensor Algebra.

[BibT_eX]

[DOI]

Kartik Hegde

Hadi Asghari Moghaddam

Christopher W. Fletcher

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Communication-Avoiding Cholesky-QR2 for Rectangular Matrices.

[BibT_eX]

[DOI]

Edward Hutter

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

2017

A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

Scaling betweenness centrality using communication-efficient sparse matrix multiplication.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations.

[BibT_eX]

[DOI]

Tobias Wicky

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

SlimSell: A Vectorizable Graph Representation for Breadth-First Search.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations.

[BibT_eX]

[DOI]

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

2016

Betweenness Centrality is more Parallelizable than Dense Matrix Multiplication.

[BibT_eX]

[DOI]

CoRR, 2016

2015

Sparse Tensor Algebra as a Parallel Programming Model.

[BibT_eX]

[DOI]

Torsten Hoefler

CoRR, 2015

2014

Provably Efficient Algorithms for Numerical Tensor Algebra.

[BibT_eX]

[DOI]

PhD thesis, 2014

A massively parallel tensor contraction framework for coupled-cluster computations.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2014

Tradeoffs between synchronization, communication, and computation in parallel linear algebra computations.

[BibT_eX]

[DOI]

Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014

Reconstructing Householder Vectors from Tall-Skinny QR.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013

Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Minimizing Communication in All-Pairs Shortest Paths.

[BibT_eX]

[DOI]

Aydin Buluç

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

A Communication-Optimal N-Body Algorithm for Direct Interactions.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012

Matrix Multiplication on Multidimensional Torus Networks.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science, 2012

Communication avoiding and overlapping for numerical linear algebra.

[BibT_eX]

[DOI]

Evangelos Georganas

Jorge González-Domínguez

Proceedings of the SC Conference on High Performance Computing Networking, 2012

2011

Sorting.

[BibT_eX]

[DOI]

Laxmikant V. Kalé

Proceedings of the Encyclopedia of Parallel Computing, 2011

Improving communication performance in dense linear algebra via topology aware collectives.

[BibT_eX]

[DOI]

Abhinav Bhatele

Proceedings of the Conference on High Performance Computing Networking, 2011

Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2010

Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2010

Highly scalable parallel sorting.

[BibT_eX]

[DOI]