Tze Meng Low

Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA, USA


According to our database1, Tze Meng Low authored at least 50 papers between 2005 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Reformulating the direct convolution for high-performance deep learning inference on ARM processors.
J. Syst. Archit., February, 2023

SMaLL: A Software Framework for portable Machine Learning Libraries.
CoRR, 2023

Exploiting Fusion Opportunities in Linear Algebraic Graph Query Engines.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2023

2022
Modeling Matrix Engines for Portability and Performance.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Families of Butterfly Counting Algorithms for Bipartite Graphs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

2021
Fusing Non Element-wise Layers in DNNs.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Delayed Asynchronous Iterative Graph Algorithms.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

2020
A Flexible Framework for Multidimensional DFTs.
SIAM J. Sci. Comput., 2020

Addressing Unreliability in Emerging Devices and Non-von Neumann Architectures Using Coded Computing.
Proc. IEEE, 2020

Linear Algebraic Louvain Method in Python.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020


Towards an Objective Metric for the Performance of Exact Triangle Count.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

3D Coded SUMMA: Communication-Efficient and Robust Parallel Matrix Multiplication.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

qLD: High-performance Computation of Linkage Disequilibrium on CPU and GPU.
Proceedings of the 20th IEEE International Conference on Bioinformatics and Bioengineering, 2020

2019
A Flexible Framework for Parallel Multi-Dimensional DFTs.
CoRR, 2019

CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors.
CoRR, 2019

Analytical cache modeling and tilesize optimization for tensor contractions.
Proceedings of the International Conference for High Performance Computing, 2019

Exploiting Symmetries of Small Prime-Sized DFTs.
Proceedings of the Parallel Processing and Applied Mathematics, 2019

Linear algebraic depth-first search.
Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, 2019

Efficient SpMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Delta-Stepping SSSP: From Vertices and Edges to GraphBLAS Implementations.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

A Portable GPU Framework for SNP Comparisons.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

2018
SPIRAL: Extreme Performance Portability.
Proc. IEEE, 2018

A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication.
CoRR, 2018

Coded FFT and Its Communication Overhead.
CoRR, 2018

A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot codes.
Proceedings of the 2018 IEEE International Symposium on Information Theory, 2018

Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

High Performance Zero-Memory Overhead Direct Convolutions.
Proceedings of the 35th International Conference on Machine Learning, 2018

PageRank Acceleration for Large Graphs with Scalable Hardware and Two-Step SpMV.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Linear Algebraic Formulation of Edge-centric K-truss Algorithms with Adjacency Matrices.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

FFTX and SpectralPack: A First Look.
Proceedings of the 25th IEEE International Conference on High Performance Computing Workshops, 2018

Masterless Coded Computing: A Fully-Distributed Coded FFT Algorithm.
Proceedings of the 56th Annual Allerton Conference on Communication, 2018

2017
A Family of Provably Correct Algorithms for Exact Triangle Counting.
Proceedings of the First International Workshop on Software Correctness for HPC Applications, 2017

Mixed data layout kernels for vectorized complex arithmetic.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

First look: Linear algebra-based triangle counting without matrix multiplication.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

High Assurance Code Generation for Cyber-Physical Systems.
Proceedings of the 18th IEEE International Symposium on High Assurance Systems Engineering, 2017

2016
The BLIS Framework: Experiments in Portability.
ACM Trans. Math. Softw., 2016

Analytical Modeling Is Enough for High-Performance BLIS.
ACM Trans. Math. Softw., 2016

Automating the Last-Mile for High Performance Dense Linear Algebra.
CoRR, 2016

Compilers, hands-off my hands-on optimizations.
Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing, 2016

Efficient Computation of Linkage Disequilibria as Dense Linear Algebra Operations.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

A scale-free structure for power-law graphs.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

2015
Enabling portable energy efficiency with memory accelerated library.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Optimizing Space Time Adaptive Processing through accelerating memory-bounded operations.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015

2014
Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors.
SIAM J. Sci. Comput., 2014

2013
Exploiting Symmetry in Tensors for High Performance
CoRR, 2013

2008
Scalable parallelization of FLAME code via the workqueuing model.
ACM Trans. Math. Softw., 2008

2006
Accumulating Householder transformations, revisited.
ACM Trans. Math. Softw., 2006

2005
Extracting SMP parallelism for dense linear algebra algorithms from high-level specifications.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005


  Loading...