Akihiko Kasagi

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

mpiQulacs: A Scalable Distributed Quantum Computer Simulator for ARM-based Clusters.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

Offline Quantum Circuit Pruning for Quantum Chemical Calculations.

[BibT_eX]

[DOI]

Satoshi Imamura

Eiji Yoshida

Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

Efficient GPU-Accelerated Bulk Evaluation of the Boys Function for Quantum Chemistry.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Symposium on Computing and Networking, CANDAR 2023, Matsue, Japan, November 28, 2023

2022

mpiQulacs: A Distributed Quantum Computer Simulator for A64FX-based Cluster Systems.

[BibT_eX]

[DOI]

CoRR, 2022

The Bonsai Hypothesis: An Efficient Network Pruning Technique.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence Applications and Innovations, 2022

BERT-Based Scientific Paper Quality Prediction.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2022, 2022

Regularizing Data for Improving Execution Time of NLP Model.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference, 2022

2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.

[BibT_eX]

[DOI]

CoRR, 2021

MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021

On the Computational Power of Convolution Pooling: A Theoretical Approach for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Acceleration of Deflate Encoding and Decoding with GPU implementations.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Symposium on Computing and Networking, 2021

Efficient and Large Scale Pre-training Techniques for Japanese Natural Language Processing.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Symposium on Computing and Networking, 2021

The 16, 384-node Parallelism of 3D-CNN Training on An Arm CPU based Supercomputer.

[BibT_eX]

[DOI]

Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

2020

Efficient convolution pooling on the GPU.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2020

An Efficient Multicore CPU Implementation for Convolution-Pooling Computation in CNNs.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Huffman Coding with Gap Arrays for GPU Acceleration.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

An Efficient Technique for Large Mini-batch Challenge of DNNs Training on Large Scale Cluster.

[BibT_eX]

[DOI]

Proceedings of the HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020

2019

Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds.

[BibT_eX]

[DOI]

CoRR, 2019

Efficient cuDNN-Compatible Convolution-Pooling on the GPU.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2019

Structured Sparse Fully-Connected Layers in the CNNs and Its GPU Acceleration.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Symposium on Computing and Networking Workshops, 2019

2017

Fast algorithm using summed area tables with unified layer performing convolution and average pooling.

[BibT_eX]

[DOI]

Tsuguchika Tabaru

Hirotaka Tamura

Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

2015

Parallelization Techniques for Error Diffusion with GPU Implementations.

[BibT_eX]

[DOI]

Proceedings of the Third International Symposium on Computing and Networking, 2015

2014

Offline Permutation on the CUDA-enabled GPU.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2014

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

2013

Offline Permutation Algorithms on the Discrete Memory Machine with Performance Evaluation on the GPU.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2013

An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU Implementation.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

2012

An Implementation of Conflict-Free Offline Permutation on the GPU.

[BibT_eX]

[DOI]