Jiajia Li

Orcid: 0000-0003-1270-4147

Affiliations:

North Carolina State University, Raleigh, NC, USA
Pacific Northwest National Laboratory, Richland, WA, USA (former)
Georgia Institute of Technology, Atlanta, GA, USA (former)

According to our database¹, Jiajia Li authored at least 55 papers between 2010 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC.

[BibT_eX]

[DOI]

Yanbo Zhao

Zhaonan Meng

Sai Krishna Teja Varma Manthena

Xu Liu

Ajay Panyala

Jiajia Li

Proceedings of the 12th ACM SIGPLAN International Workshop on Libraries, 2026

2025

SRSparse: Generating Codes for High-Performance Sparse Matrix-Vector Semiring Computations.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., June, 2025

gHyPart: GPU-friendly End-to-End Hypergraph Partitioner.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., March, 2025

RedSan: A Redundant Memory Instruction Sanitizer for GPU Programs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2025

SymProp: Scaling Sparse Symmetric Tucker Decomposition via Symmetry Propagation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

Scalable and Efficient Tensor Message-Passing Hypergraph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data, 2025

DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024

DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads.

[BibT_eX]

[DOI]

CoRR, 2024

POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table Design.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks.

[BibT_eX]

[DOI]

Keren Zhou

Karthik Ganapathi Subramanian

Proceedings of the 38th ACM International Conference on Supercomputing, 2024

2023

Sparse Symmetric Format for Tucker Decomposition.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., June, 2023

Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor Decomposition.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., June, 2023

Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Fast Parallel Tensor Times Same Vector for Hypergraphs.

[BibT_eX]

[DOI]

Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023

2022

MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems.

[BibT_eX]

[DOI]

CoRR, 2022

AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

LB-HM: load balance-aware data placement on heterogeneous memory for task-parallel HPC applications.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAs.

[BibT_eX]

[DOI]

Cheng Tan

Nicolas Bohm Agostini

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

BALA-CPD: BALanced and Asynchronous Distributed Tensor Decomposition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021

A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR.

[BibT_eX]

[DOI]

CoRR, 2021

Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

A High Performance Sparse Tensor Algebra Compiler in MLIR.

[BibT_eX]

[DOI]

Proceedings of the 7th IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC, 2021

Athena: high-performance sparse tensor contraction sequence on heterogeneous memory.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications.

[BibT_eX]

[DOI]

Cheng Tan

Tong Geng

Chenhao Xie

Nicolas Bohm Agostini

Proceedings of the 39th IEEE International Conference on Computer Design, 2021

A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Efficient Parallel Sparse Symmetric Tucker Decomposition for High-Order Tensors.

[BibT_eX]

[DOI]

Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms, 2021

2020

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Programming Strategies for Irregular Algorithms on the Emu Chick.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2020

A parallel sparse tensor benchmark suite on CPUs and GPUs.

[BibT_eX]

[DOI]

Jiajia Li

Mahesh Lakshminarasimhan

Xiaolong Wu

Ang Li

Catherine Olschanowsky

Kevin J. Barker

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

A Sparse Tensor Benchmark Suite for CPUs and GPUs.

[BibT_eX]

[DOI]

Jiajia Li

Mahesh Lakshminarasimhan

Xiaolong Wu

Ang Li

Catherine Olschanowsky

Kevin J. Barker

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

2019

A microbenchmark characterization of the Emu chick.

[BibT_eX]

[DOI]

Parallel Comput., 2019

Optimizing sparse tensor times matrix on GPUs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2019

PASTA: a parallel sparse tensor algorithm benchmark suite.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., 2019

An efficient mixed-mode representation of sparse tensors.

[BibT_eX]

[DOI]

Israt Nisa

Jiajia Li

Aravind Sukumaran-Rajam

Prashant Singh Rawat

Sriram Krishnamoorthy

P. Sadayappan

Proceedings of the International Conference for High Performance Computing, 2019

A pattern based algorithmic autotuner for graph processing on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Load-Balanced Sparse MTTKRP on GPUs.

[BibT_eX]

[DOI]

Israt Nisa

Jiajia Li

Aravind Sukumaran-Rajam

Richard W. Vuduc

P. Sadayappan

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Efficient and effective sparse tensor reordering.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing, 2019

2018

Scalable tensor decompositions in high performance computing environments.

[BibT_eX]

[DOI]

Jiajia Li

PhD thesis, 2018

An Autotuning Protocol to Rapidly Build Autotuners.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2018

Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture.

[BibT_eX]

[DOI]

Guangming Tan

Junhong Liu

Jiajia Li

ACM Trans. Math. Softw., 2018

HiCOO: hierarchical storage of sparse tensors.

[BibT_eX]

[DOI]

Jiajia Li

Jimeng Sun

Richard W. Vuduc

Proceedings of the International Conference for High Performance Computing, 2018

Bridging the gap between deep learning and sparse matrix format selection.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

An Initial Characterization of the Emu Chick.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

2017

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Model-Driven Sparse CP Decomposition for Higher-Order Tensors.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

POSTER: Bridging the Gap Between Deep Learning and Sparse Matrix Format Selection.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

2015

Introducing high performance computing concepts into engineering undergraduate curriculum: a success story.

[BibT_eX]

[DOI]

B. Neelima

Jiajia Li

Proceedings of the Workshop on Education for High-Performance Computing, 2015

An input-adaptive and in-place approach to dense tensor-times-matrix multiply.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

2013

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

2012

SMAT: An Input Adaptive Sparse Matrix-Vector Multiplication Auto-Tuner

[BibT_eX]

[DOI]

CoRR, 2012

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2012

2010

Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks.

[BibT_eX]

[DOI]

Jiajia Li

Guangming Tan

Mingyu Chen

Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

Jiajia Li

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...