Weifeng Liu

Orcid: 0000-0003-1729-7448

Affiliations:

China University of Petroleum-Beijing, Department of CST, SSSLab, Beijing, China
Norwegian University of Science and Technology, Department of Computer Science, Trondheim, Norway (former)
STFC Rutherford Appleton Laboratory, Didcot, UK (2016)
University of Copenhagen, Niels Bohr Institute (NBI), Copenhagen, Denmark (PhD 2016)

According to our database¹, Weifeng Liu authored at least 64 papers between 2014 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

DiggerBees: Depth First Search Leveraging Hierarchical Block-Level Stealing on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

Characterizing Matrix Multiplication Units across General Parallel Patterns in Scientific Computing.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

Trojan Horse: Aggregate-and-Batch for Scaling Up Sparse Direct Solvers on GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

Non-Delayed Cholesky Factorization.

[BibT_eX]

[DOI]

Proceedings of the 40th ACM International Conference on Supercomputing, 2026

Uni-STC: Unified Sparse Tensor Core.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

2025

νGNN: Non-Uniformly partitioned full-graph GNN training on mixed GPUs.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., August, 2025

Flexible Operator Fusion for Fast Sparse Transformer with Diverse Masking on GPU.

[BibT_eX]

[DOI]

CoRR, June, 2025

KAMI: Communication-Avoiding General Matrix Multiplication within a Single GPU.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2025

ReRAM-Based Process-In-Memory Accelerator for Iterative Solvers: A Systematic Survey.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2025

MemSens: Significantly Reducing Memory Overhead in Adjoint Sensitivity Analysis Using Novel Error-Bounded Lossy Compression.

[BibT_eX]

[DOI]

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

2024

thSORT: an efficient parallel sorting algorithm on multi-core DSPs.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., October, 2024

Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

AmgT: Algebraic Multigrid Solver on Tensor Cores.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

CSP: Comprehensively-Sparsified Preconditioner for Efficient Nonlinear Circuit Simulation.

[BibT_eX]

[DOI]

Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024

Leda: Leveraging Tiling Dataflow to Accelerate SpMM on HBM-Equipped FPGAs for GNNs.

[BibT_eX]

[DOI]

Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024

Cuper: Customized Dataflow and Perceptual Decoding for Sparse Matrix-Vector Multiplication on HBM-Equipped FPGAs.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

Efficient Spectral-Aware Power Supply Noise Analysis for Low-Power Design Verification.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

MASC: A Memory-Efficient Adjoint Sensitivity Analysis through Compression Using Novel Spatiotemporal Prediction.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

ReCG: ReRAM-Accelerated Sparse Conjugate Gradient.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Machine Learning and GPU Accelerated Sparse Linear Solvers for Transistor-Level Circuit Simulation: A Perspective Survey (Invited Paper).

[BibT_eX]

[DOI]

Proceedings of the 29th Asia and South Pacific Design Automation Conference, 2024

2023

TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs.

[BibT_eX]

[DOI]

Zhengyang Lu

Weifeng Liu

CCF Trans. High Perform. Comput., June, 2023

Editorial for the special issue on architecture, algorithms and applications of high performance sparse matrix computations.

[BibT_eX]

[DOI]

Weifeng Liu

Guangming Tan

Xiaowen Xu

CCF Trans. High Perform. Comput., June, 2023

DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication.

[BibT_eX]

[DOI]

Yuechen Lu

Weifeng Liu

Proceedings of the International Conference for High Performance Computing, 2023

PanguLU: A Scalable Regular Two-Dimensional Block-Cyclic Sparse Direct Solver on Distributed Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors.

[BibT_eX]

[DOI]

Proceedings of the 52nd International Conference on Parallel Processing, 2023

Accelerating Sparse LU Factorization with Density-Aware Adaptive Matrix Multiplication for Circuit Simulation.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

AmgR: Algebraic Multigrid Accelerated on ReRAM.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2023

Balancing Computation and Communication in Distributed Sparse Matrix-Vector Multiplication.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022

A Pattern-Based SpGEMM Library for Multi-Core and Many-Core Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

TileSpGEMM: a tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 51st International Conference on Parallel Processing, 2022

2021

YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2021

Implementing LU and Cholesky factorizations on artificial intelligence accelerators.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., 2021

TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

PALBBD: A Parallel ArcLength Method Using Bordered Block Diagonal Form for DC Analysis.

[BibT_eX]

[DOI]

Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021

SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020

clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2020

NUMA-Aware Optimization of Sparse Matrix-Vector Multiplication on ARMv8-Based Many-Core Architectures.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2020

Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2020

CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Efficient Block Algorithms for Parallel Sparse Triangular Solve.

[BibT_eX]

[DOI]

Zhengyang Lu

Yuyao Niu

Weifeng Liu

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

2019

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2019

Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., 2019

IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing, 2019

2018

Back-dropout transfer learning for action recognition.

[BibT_eX]

[DOI]

IET Comput. Vis., 2018

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Warp-Consolidation: A Novel Execution Model for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

2017

Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels.

[BibT_eX]

[DOI]

Ang Li

Weifeng Liu

Mads Ruben Burgdorff Kristensen

Proceedings of the International Conference for High Performance Computing, 2017

Efficient and Portable ALS Matrix Factorization for Recommender Systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Fast segmented sort on GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2017

Locality-Aware CTA Clustering for Modern GPUs.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016

Parallel Transposition of Sparse Data Structures.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Supercomputing, 2016

A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015

Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors.

[BibT_eX]

[DOI]

Weifeng Liu

Brian Vinter

Parallel Comput., 2015

A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors.

[BibT_eX]

[DOI]

Weifeng Liu

Brian Vinter

J. Parallel Distributed Comput., 2015

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication.

[BibT_eX]

[DOI]

Weifeng Liu

Brian Vinter

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Unsupervised Behavior-Specific Dictionary Learning for Abnormal Event Detection.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference 2015, 2015

2014

An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data.

[BibT_eX]

[DOI]

Weifeng Liu

Brian Vinter

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors.

[BibT_eX]

[DOI]

Weifeng Liu

Brian Vinter

Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

Weifeng Liu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...