Grey Ballard

Orcid: 0000-0003-1557-8027

According to our database1, Grey Ballard authored at least 73 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation.
SIAM J. Matrix Anal. Appl., March, 2024

Randomized Algorithms for Symmetric Nonnegative Matrix Factorization.
CoRR, 2024

2023
CP decomposition for tensors via alternating least squares with QR decomposition.
Numer. Linear Algebra Appl., December, 2023

Randomized Algorithms for Rounding in the Tensor-Train Format.
SIAM J. Sci. Comput., February, 2023

Sequential and Shared-Memory Parallel Algorithms for Partitioned Local Depths.
CoRR, 2023

Parallel Memory-Independent Communication Bounds for SYRK.
Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023

Distributed-Memory Parallel JointNMF.
Proceedings of the 37th International Conference on Supercomputing, 2023

2022
Parallel Algorithms for Tensor Train Arithmetic.
SIAM J. Sci. Comput., 2022

Parallel Randomized Tucker Decomposition Algorithms.
CoRR, 2022

Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds.
CoRR, 2022

Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds.
Proceedings of the SPAA '22: 34th ACM Symposium on Parallelism in Algorithms and Architectures, Philadelphia, PA, USA, July 11, 2022

Parallel Tensor Train Rounding using Gram SVD.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021
PLANC: Parallel Low-rank Approximation with Nonnegativity Constraints.
ACM Trans. Math. Softw., 2021

Accelerating Neural Network Training using Arbitrary Precision Approximating Matrix Multiplication Algorithms.
Proceedings of the ICPP Workshops 2021: 50th International Conference on Parallel Processing, 2021

Parallel Tucker Decomposition with Numerically Accurate SVD.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Visualizing Parallel Dynamic Programming using the Thread Safe Graphics Library.
Proceedings of the 9th IEEE/ACM Workshop on Education for High Performance Computing, 2021

2020
TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition.
ACM Trans. Math. Softw., 2020

Distributed-memory parallel symmetric nonnegative matrix factorization.
Proceedings of the International Conference for High Performance Computing, 2020

General Memory-Independent Lower Bound for MTTKRP.
Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing, 2020

Parallel Hierarchical Clustering using Rank-Two Nonnegative Matrix Factorization.
Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

2019
A Generalized Randomized Rank-Revealing Factorization.
CoRR, 2019

PLANC: Parallel Low Rank Approximation with Non-negativity Constraints.
CoRR, 2019

Joint 3D Localization and Classification of Space Debris using a Multispectral Rotating Point Spread Function.
CoRR, 2019

Dynamic Functional Magnetic Resonance Imaging Connectivity Tensor Decomposition: A New Approach to Analyze and Interpret Dynamic Brain Connectivity.
Brain Connect., 2019

2018
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization.
IEEE Trans. Knowl. Data Eng., 2018

A Practical Randomized CP Tensor Decomposition.
SIAM J. Matrix Anal. Appl., 2018

The geometry of rank decompositions of matrix multiplication II: 3×3 matrices.
CoRR, 2018

A 3D Parallel Algorithm for QR Decomposition.
Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, 2018

Shared-memory parallelization of MTTKRP for dense tensors.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Parallel Nonnegative CP Decomposition of Dense Tensors.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

2017
Guest Editor Introduction PPoPP 2016, Special Issue 2 of 2.
ACM Trans. Parallel Comput., 2017

Shared Memory Parallelization of MTTKRP for Dense Tensors.
CoRR, 2017

A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem.
Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

2016
Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication.
ACM Trans. Parallel Comput., 2016

Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid.
SIAM J. Sci. Comput., 2016

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication.
SIAM J. Sci. Comput., 2016

Improving the Numerical Stability of Fast Matrix Multiplication.
SIAM J. Matrix Anal. Appl., 2016

Network Topologies and Inevitable Contention.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

A high-performance parallel algorithm for nonnegative matrix factorization.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Parallel Tensor Compression for Large-Scale Scientific Data.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015
Avoiding Communication in Successive Band Reduction.
ACM Trans. Parallel Comput., 2015

Reconstructing Householder vectors from Tall-Skinny QR.
J. Parallel Distributed Comput., 2015

Improving the numerical stability of fast matrix multiplication algorithms.
CoRR, 2015

Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication.
Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, 2015

A framework for practical parallel fast matrix multiplication.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Diamond Sampling for Approximate Maximum All-Pairs Dot-Product (MAD) Search.
Proceedings of the 2015 IEEE International Conference on Data Mining, 2015

2014
Communication-Avoiding Symmetric-Indefinite Factorization.
SIAM J. Matrix Anal. Appl., 2014

Communication costs of Strassen's matrix multiplication.
Commun. ACM, 2014

Communication lower bounds and optimal algorithms for numerical linear algebra.
Acta Numer., 2014

Reconstructing Householder Vectors from Tall-Skinny QR.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013
Avoiding Communication in Dense Linear Algebra.
PhD thesis, 2013

Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout.
Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013

Communication optimal parallel multiplication of sparse random matrices.
Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013

Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012
Graph expansion and communication costs of fast matrix multiplication.
J. ACM, 2012

Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds
CoRR, 2012

Communication-optimal parallel algorithm for strassen's matrix multiplication.
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds.
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Communication-avoiding parallel strassen: implementation and performance.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Communication avoiding successive band reduction.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication.
Proceedings of the Design and Analysis of Algorithms, 2012

2011
Minimizing Communication in Numerical Linear Algebra.
SIAM J. Matrix Anal. Appl., 2011

Graph expansion and communication costs of fast matrix multiplication: regular submission.
Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011

Brief announcement: communication bounds for heterogeneous architectures.
Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011

Efficiently Computing Tensor Eigenvalues on a GPU.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Communication-Avoiding QR Decomposition for GPUs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010
Communication-optimal Parallel and Sequential Cholesky Decomposition.
SIAM J. Sci. Comput., 2010

Minimizing Communication for Eigenproblems and the Singular Value Decomposition
CoRR, 2010

2009
Minimizing Communication in Linear Algebra
CoRR, 2009

Communication-optimal parallel and sequential Cholesky decomposition: extended abstract.
Proceedings of the SPAA 2009: Proceedings of the 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2009

2006
Modeling protein dependency networks using CoCoA.
ACM Crossroads, 2006


  Loading...