Akihiko Kasagi

Orcid: 0000-0002-5793-335X

According to our database1, Akihiko Kasagi authored at least 31 papers between 2012 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
GPU implementations of deflate encoding and decoding.
Concurr. Comput. Pract. Exp., 2023

NEEBS: Nonexpert large-scale environment building system for deep neural network.
Concurr. Comput. Pract. Exp., 2023

A novel structured sparse fully connected layer in convolutional neural networks.
Concurr. Comput. Pract. Exp., 2023

An Analysis of Graph Neural Network Memory Access Patterns.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

mpiQulacs: A Scalable Distributed Quantum Computer Simulator for ARM-based Clusters.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

Offline Quantum Circuit Pruning for Quantum Chemical Calculations.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

Efficient GPU-Accelerated Bulk Evaluation of the Boys Function for Quantum Chemistry.
Proceedings of the Eleventh International Symposium on Computing and Networking, CANDAR 2023, Matsue, Japan, November 28, 2023

2022
mpiQulacs: A Distributed Quantum Computer Simulator for A64FX-based Cluster Systems.
CoRR, 2022

The Bonsai Hypothesis: An Efficient Network Pruning Technique.
Proceedings of the Artificial Intelligence Applications and Innovations, 2022

BERT-Based Scientific Paper Quality Prediction.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2022, 2022

Regularizing Data for Improving Execution Time of NLP Model.
Proceedings of the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference, 2022

2021
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.
CoRR, 2021


On the Computational Power of Convolution Pooling: A Theoretical Approach for Deep Learning.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Acceleration of Deflate Encoding and Decoding with GPU implementations.
Proceedings of the Ninth International Symposium on Computing and Networking, 2021

Efficient and Large Scale Pre-training Techniques for Japanese Natural Language Processing.
Proceedings of the Ninth International Symposium on Computing and Networking, 2021

The 16, 384-node Parallelism of 3D-CNN Training on An Arm CPU based Supercomputer.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

2020
Efficient convolution pooling on the GPU.
J. Parallel Distributed Comput., 2020

An Efficient Multicore CPU Implementation for Convolution-Pooling Computation in CNNs.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Huffman Coding with Gap Arrays for GPU Acceleration.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

An Efficient Technique for Large Mini-batch Challenge of DNNs Training on Large Scale Cluster.
Proceedings of the HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020

2019
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds.
CoRR, 2019

Efficient cuDNN-Compatible Convolution-Pooling on the GPU.
Proceedings of the Parallel Processing and Applied Mathematics, 2019

Structured Sparse Fully-Connected Layers in the CNNs and Its GPU Acceleration.
Proceedings of the Seventh International Symposium on Computing and Networking Workshops, 2019

2017
Fast algorithm using summed area tables with unified layer performing convolution and average pooling.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

2015
Parallelization Techniques for Error Diffusion with GPU Implementations.
Proceedings of the Third International Symposium on Computing and Networking, 2015

2014
Offline Permutation on the CUDA-enabled GPU.
IEICE Trans. Inf. Syst., 2014

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

2013
Offline Permutation Algorithms on the Discrete Memory Machine with Performance Evaluation on the GPU.
IEICE Trans. Inf. Syst., 2013

An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU Implementation.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

2012
An Implementation of Conflict-Free Offline Permutation on the GPU.
Proceedings of the Third International Conference on Networking and Computing, 2012


  Loading...