Kazuya Matsumoto

Orcid: 0000-0001-5858-1598

According to our database1, Kazuya Matsumoto authored at least 18 papers between 2008 and 2022.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2022
High Performance Software Systolic Array Computing of Multi-channel Convolution on a GPU.
Proceedings of the Computational Science and Its Applications - ICCSA 2022, 2022

2019
Implementation and performance evaluation of a communication-avoiding GMRES method for stencil-based code on GPU cluster.
J. Supercomput., 2019

Brain-inspired Co-design of Algorithm/Architecture for CNN Accelerators.
Proceedings of the 8th International Congress on Advanced Applied Informatics, 2019

Effectiveness of performance tuning techniques for general matrix multiplication on the PEZY-SC2.
Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2019

2017
Application of a communication-avoiding generalized minimal residual method to a gyrokinetic five dimensional eulerian code on many core platforms.
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017

2016
Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2016, 2016

2015
Implementation of CG Method on GPU Cluster with Proprietary Interconnect TCA for GPU Direct Communication.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Evaluation of FFT for GPU Cluster Using Tightly Coupled Accelerators Architecture.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2012
Blocked United Algorithm for the All-Pairs Shortest Paths Problem on Hybrid CPU-GPU Systems.
IEICE Trans. Inf. Syst., 2012

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU.
Proceedings of the IEEE 6th International Symposium on Embedded Multicore/Manycore SoCs, 2012

2011
Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems.
Proceedings of the International Conference on Computational Science, 2011

Blocked All-Pairs Shortest Paths Algorithm for Hybrid CPU-GPU System.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

2010
Matrix Multiply-Add in Min-plus Algebra on a Short-Vector SIMD Processor of Cell/B.E..
Proceedings of the First International Conference on Networking and Computing, 2010

2009
A Solution of the All-Pairs Shortest Paths Problem on the Cell Broadband Engine Processor.
IEICE Trans. Inf. Syst., 2009

Matrix Inversion on the Cell/B.E. Processor.
Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

2008
Incremental Principal Component Analysis Based on Adaptive Accumulation Ratio.
Proceedings of the Advances in Neuro-Information Processing, 15th International Conference, 2008


  Loading...