Peng Jiang

Proceedings of the 40th ACM International Conference on Supercomputing, 2026

2025

Lossy Compression of Scientific Data: Applications Constrains and Requirements.

[BibT_eX]

[DOI]

CoRR, March, 2025

What to Support When You're Compressing: The State of Practice Gaps and Opportunities for Scientific Data Compression.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2025

Matcha: A Language and Compiler for Backtracking-Based Subgraph Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

A Memory-Efficient and Computation-Balanced Lossy Compressor on Wafer-Scale Engine.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

Improving Accuracy and Efficiency of Graph Embedding Training with Fine-Grained Parameter Management.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

2024

GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding.

[BibT_eX]

[DOI]

Jing Li

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

2023

PIMMiner: A High-performance PIM Architecture-aware Graph Mining Framework.

[BibT_eX]

[DOI]

Jiya Su

Rujia Wang

CoRR, 2023

End-to-End LU Factorization of Large Matrices on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

2022

STMatch: Accelerating Graph Pattern Matching on GPU with Stack-Based Loop Optimizations.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Exposing and Exploiting Fine-Grained Block Structures for Fast and Accurate Sparse Training.

[BibT_eX]

[DOI]

Shihui Song

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Scaling and Selecting GPU Methods for All Pairs Shortest Paths (APSP) Computations.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Rethinking graph data placement for graph neural network training on multiple GPUs.

[BibT_eX]

[DOI]

Shihui Song

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

Communication-Efficient Sampling for Distributed Training of Graph Convolutional Networks.

[BibT_eX]

[DOI]

Masuma Akter Rumi

CoRR, 2021

An Efficient Graph Mining System for Large Patterns.

[BibT_eX]

[DOI]

Rujia Wang

Bo Wu

CoRR, 2021

Exploring PIM Architecture for High-Performance Graph Pattern Mining.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2021

Scaling Sparse Matrix Multiplication on CPU-GPU Nodes.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020

Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation.

[BibT_eX]

[DOI]

Yang Xia

ACM Trans. Parallel Comput., 2020

Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Scaling out speculative execution of finite-state machines with parallel merge.

[BibT_eX]

[DOI]

Yang Xia

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs.

[BibT_eX]

[DOI]

Changwan Hong

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Accelerating distributed stochastic gradient descent with adaptive periodic parameter averaging: poster.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Enabling prefix sum parallelism pattern for recurrences with principled function reconstruction.

[BibT_eX]

[DOI]

Yang Xia

Proceedings of the 28th International Conference on Compiler Construction, 2019

A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction.

[BibT_eX]

[DOI]

Gangyi Zhu

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Revealing parallel scans and reductions in sequential loops through function reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

Revealing parallel scans and reductions in recurrences through function reconstruction.

[BibT_eX]

[DOI]

Linchuan Chen

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2017

2016

Reusing Data Reorganization for Efficient SIMD Parallelization of Adaptive Irregular Applications.

[BibT_eX]

[DOI]

Linchuan Chen

Proceedings of the 2016 International Conference on Supercomputing, 2016

Exploiting recent SIMD architectural advances for irregular applications.

[BibT_eX]

[DOI]

Linchuan Chen