Akira Nukada

Orcid: 0000-0001-7959-6975

According to our database1, Akira Nukada authored at least 41 papers between 2005 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Efficient checkpoint/Restart of CUDA applications.
Parallel Comput., 2023

2022
Efficient high-precision integer multiplication on the GPU.
Int. J. High Perform. Comput. Appl., 2022

Accelerating data transfer between host and device using idle GPU.
Proceedings of the GPGPU@PPoPP 2022: Proceedings of the 14th Workshop on General Purpose Processing Using GPU, 2022

2021
Performance Optimization of Allreduce Operation for Multi-GPU Systems.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

2019
Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

2018
Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations.
Parallel Comput., 2018

MRG8: Random Number Generation for the Exascale Era.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2018

Efficient Solving of Scan Primitive on Multi-GPU Systems.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Optimizing Preconditioned Conjugate Gradient on TaihuLight for OpenFOAM.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017
High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU.
Proceedings of the 46th International Conference on Parallel Processing, 2017

Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor.
Proceedings of the 46th International Conference on Parallel Processing, 2017

2016
Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU.
Proceedings of the International Conference on Computational Science 2016, 2016

2015
Efficient Execution of Multiple CUDA Applications Using Transparent Suspend, Resume and Migration.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Modeling Gather and Scatter with Hardware Performance Counters for Xeon Phi.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
Mixed-Precision AMG method for Many Core Accelerators.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

Cache-aware sparse matrix formats for Kepler GPU.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

TSUBAME-KFC: A modern liquid submersion cooling prototype towards exascale becoming the greenest supercomputer in the world.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2012
Scalable multi-GPU 3-D FFT for TSUBAME 2.0 supercomputer.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

High performance 3-D FFT using multiple CUDA GPUs.
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012

2011
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer.
Proceedings of the Conference on High Performance Computing Networking, 2011

NVCR: A Transparent Checkpoint-Restart Library for NVIDIA CUDA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Hamming Color Code for Dense and Robust One-shot 3D Scanning.
Proceedings of the British Machine Vision Conference, 2011

2010
High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning.
Comput. Sci. Res. Dev., 2010

An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code.
Proceedings of the Conference on High Performance Computing Networking, 2010

A high-performance fault-tolerant software framework for memory on commodity GPUs.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Linpack evaluation on a supercomputer with heterogeneous accelerators.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Low-overhead diskless checkpoint for hybrid computing systems.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

Statistical power modeling of GPU kernels using performance counters.
Proceedings of the International Green Computing Conference 2010, 2010

Toward Automatic Performance Tuning for Numerical Simulations in the SILC Matrix Computation Framework.
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009
Auto-tuning 3-D FFT library for CUDA GPUs.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Fast Conjugate Gradients with Multiple GPUs.
Proceedings of the Computational Science, 2009

Aspects of GPU for general purpose high performance computing.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

2007
Cloth Simulation in the SILC Matrix Computation Framework: A Case Study.
Proceedings of the Parallel Processing and Applied Mathematics, 2007

High Performance 3D Convolution for Protein Docking on IBM Blue Gene.
Proceedings of the Parallel and Distributed Processing and Applications, 2007

High Performance FFT on SGI Altix 3700.
Proceedings of the High Performance Computing and Communications, 2007

2006
Poster reception - Scalable software infrastructure project.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Distributed SILC: An Easy-to-Use Interface for MPI-Based Parallel Matrix Computation Libraries.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

FFTSS: A High Performance Fast Fourier Transform Library.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
SILC: A Flexible and Environment-Independent Interface for Matrix Computation Libraries.
Proceedings of the Parallel Processing and Applied Mathematics, 2005

Performance Evaluation of Parallel Sparse Matrix-Vector Products on SGI Altix3700.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005


  Loading...