Jiajia Li

Orcid: 0000-0003-1270-4147

Affiliations:
  • North Carolina State University, Raleigh, NC, USA
  • Pacific Northwest National Laboratory, Richland, WA, USA (former)
  • Georgia Institute of Technology, Atlanta, GA, USA (former)


According to our database1, Jiajia Li authored at least 46 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table Design.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

2023
Sparse Symmetric Format for Tucker Decomposition.
IEEE Trans. Parallel Distributed Syst., June, 2023

Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor Decomposition.
ACM Trans. Parallel Comput., June, 2023

Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Fast Parallel Tensor Times Same Vector for Hypergraphs.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023

2022
MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems.
CoRR, 2022

AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

LB-HM: load balance-aware data placement on heterogeneous memory for task-parallel HPC applications.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

BALA-CPD: BALanced and Asynchronous Distributed Tensor Decomposition.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR.
CoRR, 2021

Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

A High Performance Sparse Tensor Algebra Compiler in MLIR.
Proceedings of the 7th IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC, 2021

Athena: high-performance sparse tensor contraction sequence on heterogeneous memory.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Efficient Parallel Sparse Symmetric Tucker Decomposition for High-Order Tensors.
Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms, 2021

2020
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.
IEEE Trans. Parallel Distributed Syst., 2020

Programming Strategies for Irregular Algorithms on the Emu Chick.
ACM Trans. Parallel Comput., 2020

A parallel sparse tensor benchmark suite on CPUs and GPUs.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

A Sparse Tensor Benchmark Suite for CPUs and GPUs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

2019
A microbenchmark characterization of the Emu chick.
Parallel Comput., 2019

Optimizing sparse tensor times matrix on GPUs.
J. Parallel Distributed Comput., 2019

PASTA: a parallel sparse tensor algorithm benchmark suite.
CCF Trans. High Perform. Comput., 2019

An efficient mixed-mode representation of sparse tensors.
Proceedings of the International Conference for High Performance Computing, 2019

A pattern based algorithmic autotuner for graph processing on GPUs.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Load-Balanced Sparse MTTKRP on GPUs.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Efficient and effective sparse tensor reordering.
Proceedings of the ACM International Conference on Supercomputing, 2019

2018
Scalable tensor decompositions in high performance computing environments.
PhD thesis, 2018

An Autotuning Protocol to Rapidly Build Autotuners.
ACM Trans. Parallel Comput., 2018

Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture.
ACM Trans. Math. Softw., 2018

HiCOO: hierarchical storage of sparse tensors.
Proceedings of the International Conference for High Performance Computing, 2018

Bridging the gap between deep learning and sparse matrix format selection.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

An Initial Characterization of the Emu Chick.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

2017
Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Model-Driven Sparse CP Decomposition for Higher-Order Tensors.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

POSTER: Bridging the Gap Between Deep Learning and Sparse Matrix Format Selection.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures.
Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

2015
Introducing high performance computing concepts into engineering undergraduate curriculum: a success story.
Proceedings of the Workshop on Education for High-Performance Computing, 2015

An input-adaptive and in-place approach to dense tensor-times-matrix multiply.
Proceedings of the International Conference for High Performance Computing, 2015

2013
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

2012
SMAT: An Input Adaptive Sparse Matrix-Vector Multiplication Auto-Tuner
CoRR, 2012

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs.
Proceedings of the International Conference on Supercomputing, 2012

2010
Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks.
Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010


  Loading...