We stand with Ukraine

We stand with Ukraine

Peng Zhang

Orcid: 0000-0001-8364-9793

Affiliations:

National University of Defense Technology, Software Institute, College of Computer, Compiler Laboratory, Changsha, China

According to our database¹, Peng Zhang authored at least 24 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

mtGEMM: An Efficient GEMM Library for Modern Multi-Core DSPs.

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., April, 2026

Optimizing small matrix multiplications via batch grouping on multi-core DSPs.

[DOI]

,

,

,

,

CCF Trans. High Perform. Comput., February, 2026

2025

nDirect2: A High-Performance Library for Direct Convolutions on Multicore CPUs.

[DOI]

,

,

,

,

,

,

,

,

,

,

IEEE Trans. Computers, June, 2025

An empirical performance evaluation of SYCL on ARM multi-core processors.

[DOI]

,

,

,

,

,

CCF Trans. High Perform. Comput., February, 2025

Constraint-Driven Auto-Tuning of GEMM-like Operators for MT-3000 Many-core Processor.

[DOI]

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2025

Optimizing Direct Convolutions on High-Performance Multi-Core DSPs.

[DOI]

,

,

,

,

,

,

Proceedings of the 54th International Conference on Parallel Processing, 2025

Selection of Supervised Learning-Based Sparse Matrix Reordering Algorithms.

[DOI]

,

,

,

,

,

,

Proceedings of the 32nd IEEE International Conference on High Performance Computing, 2025

2024

thSORT: an efficient parallel sorting algorithm on multi-core DSPs.

[DOI]

,

,

,

,

CCF Trans. High Perform. Comput., October, 2024

Optimizing General Matrix Multiplications on Modern Multi-core DSPs.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Optimizing Stencil Computation on Multi-core DSPs.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 53rd International Conference on Parallel Processing, 2024

Optimizing SpMV on Heterogeneous Multi-Core DSPs through Improved Locality and Vectorization.

[DOI]

,

,

,

,

Proceedings of the 53rd International Conference on Parallel Processing, 2024

2023

Programming bare-metal accelerators with heterogeneous threading models: a case study of Matrix-3000.

[DOI]

,

,

,

,

,

,

Frontiers Inf. Technol. Electron. Eng., 2023

Optimizing Direct Convolutions on ARM Multi-Cores.

[DOI]

,

,

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2023

The Optimization of Multi-physics Application Simulated by Lattice Boltzmann Method Based on Domestic Processors.

[DOI]

,

,

,

,

,

,

Proceedings of the 2nd International Conference on Networks, 2023

MTMap: A Long-Read Alignment Tool based on Multi-Core DSPs.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2023

2021

Large-Scale Parallel Alignment Algorithm for SMRT Reads.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Algorithms and Architectures for Parallel Processing, 2021

2020

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures.

[DOI]

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2020

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach.

[DOI]

,

,

,

,

,

CoRR, 2020

2019

The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 48th International Conference on Parallel Processing, 2019

2018

Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach.

[DOI]

,

,

,

,

CoRR, 2018

Auto-tuning Streamed Applications on Intel Xeon Phi.

[DOI]

,

,

,

,

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

MOCL: an efficient openCL implementation for the matrix-2000 architecture.

[DOI]

,

,

,

,

,

Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

2017

Implementing and Evaluating OpenCL on an ARMv8 Multi-Core CPU.

[DOI]

,

,

,

,

Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

2016

Evaluating Multiple Streams on Heterogeneous Platforms.

[DOI]

,

,

,

,

,

,

Parallel Process. Lett., 2016

Loading...