Naoto Fukumoto

Orcid: 0000-0003-2103-881X

According to our database1, Naoto Fukumoto authored at least 18 papers between 2008 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Introducing software pipelining for the A64FX processor into LLVM.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops, 2024

2023
mpiQulacs: A Scalable Distributed Quantum Computer Simulator for ARM-based Clusters.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

2022
A traffic-aware memory-cube network using bypassing.
Microprocess. Microsystems, April, 2022

A Binary Translator to Accelerate Development of Deep Learning Processing Library for AArch64 CPU.
IEICE Trans. Electron., 2022

mpiQulacs: A Distributed Quantum Computer Simulator for A64FX-based Cluster Systems.
CoRR, 2022

Performance Analysis of Multi-Containerized MD Simulations for Low-Level Resource Allocation.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Efficient Collision-Free MTTKRP Algorithm for Multi-core CPUs with Less Memory Usage.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

2021
Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization.
IEICE Trans. Electron., 2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.
CoRR, 2021

Low-Latency Low-Energy Memory-Cube Networks using Dual-Voltage Datapaths.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021


The 16, 384-node Parallelism of 3D-CNN Training on An Arm CPU based Supercomputer.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

Towards Straggler-Tolerant and Accuracy-Aware Distributed DNN Training in Clouds.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
An Efficient Technique for Large Mini-batch Challenge of DNNs Training on Large Scale Cluster.
Proceedings of the HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020

2019
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds.
CoRR, 2019

2017
Understanding storage traffic characteristics on enterprise virtual desktop infrastructure.
Proceedings of the 10th ACM International Systems and Storage Conference, 2017

2009
Performance balancing: software-based on-chip memory management for effective CMP executions.
Proceedings of the 10th workshop on MEmory performance, 2009

2008
Analyzing the impact of data prefetching on Chip MultiProcessors.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008


  Loading...