Tze Meng Low

CoRR, April, 2026

2025

Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2025

Gen-AI in a Bottle: Experiments with LLMs to Generate HPC Kernels.

[BibT_eX]

[DOI]

Elliott Binder

Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025

FATHOM: Fast Attention Through Optimizing Memory.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

Architecture-Aware Models of AI Engines for High-Performance Matrix Matrix Multiplication.

[BibT_eX]

[DOI]

Elliott D. Binder

Jeffrey Low

Proceedings of the 54th International Conference on Parallel Processing, 2025

2024

SMaLL: Software for Rapidly Instantiating Machine Learning Libraries.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., May, 2024

Linear Algebra Approach for Directed Triad Counting and Enumeration.

[BibT_eX]

[DOI]

Yuttapichai Kerdcharoen

Orathai Sangpetch

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

2023

Reformulating the direct convolution for high-performance deep learning inference on ARM processors.

[BibT_eX]

[DOI]

Enrique S. Quintana-Ortí

Andrés E. Tomás

J. Syst. Archit., February, 2023

SMaLL: A Software Framework for portable Machine Learning Libraries.

[BibT_eX]

[DOI]

CoRR, 2023

Exploiting Fusion Opportunities in Linear Algebraic Graph Query Engines.

[BibT_eX]

[DOI]

Yuttapichai Kerdcharoen

Rajalakshmi Srinivasaraghavan

Proceedings of the IEEE High Performance Extreme Computing Conference, 2023

2022

Modeling Matrix Engines for Portability and Performance.

[BibT_eX]

[DOI]

Nicholai Tukanov

José E. Moreira

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Families of Butterfly Counting Algorithms for Bipartite Graphs.

[BibT_eX]

[DOI]

Jay A. Acosta

Devangi N. Parikh

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

2021

Fusing Non Element-wise Layers in DNNs.

[BibT_eX]

[DOI]

Martin D. Schatz

Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Delayed Asynchronous Iterative Graph Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

2020

A Flexible Framework for Multidimensional DFTs.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2020

Addressing Unreliability in Emerging Devices and Non-von Neumann Architectures Using Coded Computing.

[BibT_eX]

[DOI]

Proc. IEEE, 2020

Linear Algebraic Louvain Method in Python.

[BibT_eX]

[DOI]

Michel Pelletier

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Evaluation of Graph Analytics Frameworks Using the GAP Benchmark Suite.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Towards an Objective Metric for the Performance of Exact Triangle Count.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

3D Coded SUMMA: Communication-Efficient and Robust Parallel Matrix Multiplication.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2020: Parallel Processing, 2020

qLD: High-performance Computation of Linkage Disequilibrium on CPU and GPU.

[BibT_eX]

[DOI]

Charalampos Theodoris

Nikolaos Alachiotis

Pavlos Pavlidis

Proceedings of the 20th IEEE International Conference on Bioinformatics and Bioengineering, 2020

2019

A Flexible Framework for Parallel Multi-Dimensional DFTs.

[BibT_eX]

[DOI]

CoRR, 2019

CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors.

[BibT_eX]

[DOI]

CoRR, 2019

Analytical cache modeling and tilesize optimization for tensor contractions.

[BibT_eX]

[DOI]

Rui Li

Aravind Sukumaran-Rajam

Proceedings of the International Conference for High Performance Computing, 2019

Exploiting Symmetries of Small Prime-Sized DFTs.

[BibT_eX]

[DOI]

Devangi N. Parikh

Proceedings of the Parallel Processing and Applied Mathematics, 2019

Linear algebraic depth-first search.

[BibT_eX]

[DOI]

Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, 2019

Efficient SpMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Delta-Stepping SSSP: From Vertices and Edges to GraphBLAS Implementations.

[BibT_eX]

[DOI]

Rahul Mayuranath

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

A Portable GPU Framework for SNP Comparisons.

[BibT_eX]

[DOI]

Elliott Binder

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU.

[BibT_eX]

[DOI]

Kyungjoo Kim

Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

2018

SPIRAL: Extreme Performance Portability.

[BibT_eX]

[DOI]

Richard Michael Veras

Proc. IEEE, 2018

A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication.

[BibT_eX]

[DOI]

CoRR, 2018

Coded FFT and Its Communication Overhead.

[BibT_eX]

[DOI]

Haewon Jeong

Pulkit Grover

CoRR, 2018

A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot codes.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Information Theory, 2018

Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

High Performance Zero-Memory Overhead Direct Convolutions.

[BibT_eX]

[DOI]

Jiyuan Zhang

Proceedings of the 35th International Conference on Machine Learning, 2018

PageRank Acceleration for Large Graphs with Scalable Hardware and Two-Step SpMV.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Linear Algebraic Formulation of Edge-centric K-truss Algorithms with Adjacency Matrices.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

FFTX and SpectralPack: A First Look.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Conference on High Performance Computing Workshops, 2018

Masterless Coded Computing: A Fully-Distributed Coded FFT Algorithm.

[BibT_eX]

[DOI]

Haewon Jeong

Pulkit Grover

Proceedings of the 56th Annual Allerton Conference on Communication, 2018

2017

A Family of Provably Correct Algorithms for Exact Triangle Counting.

[BibT_eX]

[DOI]

Matthew Lee

Proceedings of the First International Workshop on Software Correctness for HPC Applications, 2017

Mixed data layout kernels for vectorized complex arithmetic.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

First look: Linear algebra-based triangle counting without matrix multiplication.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

High Assurance Code Generation for Cyber-Physical Systems.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on High Assurance Systems Engineering, 2017

2016

The BLIS Framework: Experiments in Portability.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2016

Analytical Modeling Is Enough for High-Performance BLIS.

[BibT_eX]

[DOI]

Francisco D. Igual

Tyler M. Smith

Enrique S. Quintana-Ortí

ACM Trans. Math. Softw., 2016

Automating the Last-Mile for High Performance Dense Linear Algebra.

[BibT_eX]

[DOI]

Richard Michael Veras

Tyler Michael Smith

CoRR, 2016

Compilers, hands-off my hands-on optimizations.

[BibT_eX]

[DOI]

Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing, 2016

Efficient Computation of Linkage Disequilibria as Dense Linear Algebra Operations.

[BibT_eX]

[DOI]

Nikolaos Alachiotis

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

A scale-free structure for power-law graphs.

[BibT_eX]

[DOI]

Richard Veras

Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

2015

Enabling portable energy efficiency with memory accelerated library.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Optimizing Space Time Adaptive Processing through accelerating memory-bounded operations.

[BibT_eX]

[DOI]

Qi Guo

Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015

2014

Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors.

[BibT_eX]

[DOI]

Martin D. Schatz

Tamara G. Kolda

SIAM J. Sci. Comput., 2014

2013

Exploiting Symmetry in Tensors for High Performance

[BibT_eX]

[DOI]

Martin D. Schatz

Tamara G. Kolda

CoRR, 2013

2008

Scalable parallelization of FLAME code via the workqueuing model.

[BibT_eX]

[DOI]

Field G. Van Zee

Paolo Bientinesi

ACM Trans. Math. Softw., 2008

2006

Accumulating Householder transformations, revisited.

[BibT_eX]

[DOI]

Thierry Joffrain

Enrique S. Quintana-Ortí

Field G. Van Zee

ACM Trans. Math. Softw., 2006

2005

Extracting SMP parallelism for dense linear algebra algorithms from high-level specifications.

[BibT_eX]

[DOI]