Peng Zhang

Orcid: 0000-0001-8364-9793

Affiliations:
  • National University of Defense Technology, Software Institute, College of Computer, Compiler Laboratory, Changsha, China


According to our database1, Peng Zhang authored at least 24 papers between 2016 and 2026.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
mtGEMM: An Efficient GEMM Library for Modern Multi-Core DSPs.
IEEE Trans. Parallel Distributed Syst., April, 2026

Optimizing small matrix multiplications via batch grouping on multi-core DSPs.
CCF Trans. High Perform. Comput., February, 2026

2025
nDirect2: A High-Performance Library for Direct Convolutions on Multicore CPUs.
IEEE Trans. Computers, June, 2025

An empirical performance evaluation of SYCL on ARM multi-core processors.
CCF Trans. High Perform. Comput., February, 2025

Constraint-Driven Auto-Tuning of GEMM-like Operators for MT-3000 Many-core Processor.
Proceedings of the International Conference for High Performance Computing, 2025

Optimizing Direct Convolutions on High-Performance Multi-Core DSPs.
Proceedings of the 54th International Conference on Parallel Processing, 2025

Selection of Supervised Learning-Based Sparse Matrix Reordering Algorithms.
Proceedings of the 32nd IEEE International Conference on High Performance Computing, 2025

2024
thSORT: an efficient parallel sorting algorithm on multi-core DSPs.
CCF Trans. High Perform. Comput., October, 2024

Optimizing General Matrix Multiplications on Modern Multi-core DSPs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Optimizing Stencil Computation on Multi-core DSPs.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

Optimizing SpMV on Heterogeneous Multi-Core DSPs through Improved Locality and Vectorization.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

2023
Programming bare-metal accelerators with heterogeneous threading models: a case study of Matrix-3000.
Frontiers Inf. Technol. Electron. Eng., 2023

Optimizing Direct Convolutions on ARM Multi-Cores.
Proceedings of the International Conference for High Performance Computing, 2023

The Optimization of Multi-physics Application Simulated by Lattice Boltzmann Method Based on Domestic Processors.
Proceedings of the 2nd International Conference on Networks, 2023

MTMap: A Long-Read Alignment Tool based on Multi-Core DSPs.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2023

2021
Large-Scale Parallel Alignment Algorithm for SMRT Reads.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2021

2020
Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures.
IEEE Trans. Parallel Distributed Syst., 2020

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach.
CoRR, 2020

2019
The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach.
CoRR, 2018

Auto-tuning Streamed Applications on Intel Xeon Phi.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

MOCL: an efficient openCL implementation for the matrix-2000 architecture.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

2017
Implementing and Evaluating OpenCL on an ARMv8 Multi-Core CPU.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

2016
Evaluating Multiple Streams on Heterogeneous Platforms.
Parallel Process. Lett., 2016


  Loading...