Satoshi Ohshima

Takeshi Nanri

Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region, 2026

2025

3Dify: a Framework for Procedural 3D-CG Generation Assisted by LLMs Using MCP and RAG.

[BibT_eX]

[DOI]

CoRR, October, 2025

Large-Scale FMO-MP2 Calculations of the Spike Protein Droplet Model.

[BibT_eX]

[DOI]

J. Comput. Chem., 2025

A Study on the Performance and Usability of Managed Memory and Unified Memory for Accelerating Numerical Calculation Program.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2025

Accelerating Heterogeneous Coupling Computing with WaitIO Using RDMA.

[BibT_eX]

[DOI]

Proceedings of the 2025 International Conference on High Performance Computing in Asia-Pacific Region Workshops, 2025

2024

Xabclib:A Fully Auto-tuned Sparse Iterative Solver.

[BibT_eX]

[DOI]

CoRR, 2024

WaitIO-Hybrid: Communication for Coupling MPI Programs Among Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2024

Adaptation of XAI to Auto-tuning for Numerical Libraries.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2024

2023

Autotuning by Changing Directives and Number of Threads in OpenMP using ppOpen-AT.

[BibT_eX]

[DOI]

CoRR, 2023

Implementation of Radio Wave Propagation using RT Cores and Consideration of Programming Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Parallelization of Automatic Tuning for Hyperparameter Optimization of Pedestrian Route Prediction Applications using Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2023

2022

mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations.

[BibT_eX]

[DOI]

CoRR, 2022

QR Factorization of Block Low-Rank Matrices on Multi-instance GPU.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022

Autotuning Power Consumption and Computation Accuracy using ppOpen-AT.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2022

17th IEEE International Workshop on Automatic Performance Tuning (iWAPT2022).

[BibT_eX]

[DOI]

Che-Rung Lee

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Intl. Conf. on Dependable, 2022

2021

Parallelization of GKV benchmark using OpenACC.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

An Auto-tuning with Adaptation of A64 Scalable Vector Extension for SPIRAL.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020

Performance Evaluation of Accurate Matrix-Matrix Multiplication on GPU Using Sparse Matrix Multiplications.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Symposium on Computing and Networking Workshops, 2020

Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures.

[BibT_eX]

[DOI]

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

2019

Optimization of Numerous Small Dense-Matrix-Vector Multiplications in H-Matrix Arithmetic on GPU.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2019

Performance Evaluation of the MODYLAS Application on Modern Multi-core and Many-Core Environments.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

2018

A thread-level parallelization of pairwise additive potential and force calculations suitable for current many-core architectures.

[BibT_eX]

[DOI]

J. Supercomput., 2018

Optimization of Hierarchical Matrix Computation on GPU.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018

Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

2017

Auto-Tuning on NUMA and Many-Core Environments with an FDM Code.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016

Utilization and Expansion of ppOpen-AT for OpenACC.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Auto-Tuning of Hybrid MPI/OpenMP Execution with Code Selection by ppOpen-AT.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015

Directive-Based Auto-Tuning for the Finite Difference Method on the Xeon Phi.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

2014

Performance Optimization of SpMV Using CRS Format by Considering OpenMP Scheduling on CPUs and MIC.

[BibT_eX]

[DOI]

Proceedings of the IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, 2014

Auto-tuning of Computation Kernels from an FDM Code with ppOpen-AT.

[BibT_eX]

[DOI]

Proceedings of the IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, 2014

Implementation and Evaluation of an AMR Framework for FDM Applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2014

2013

A Sparse Matrix Library with Automatic Selection of Iterative Solvers and Preconditioners.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2013

2012

Implementation and Evaluation of 3D Finite Element Method Application for CUDA.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science, 2012

Control Formats for Unsymmetric and Symmetric Sparse Matrix-Vector Multiplications on OpenMP Implementations.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science, 2012

SSG-AT: An Auto-tuning Method of Sparse Matrix-vector Multiplicataion for Semi-structured Grids - An Adaptation to OpenFOAM.

[BibT_eX]

[DOI]

Satoshi Ito

Proceedings of the IEEE 6th International Symposium on Embedded Multicore/Manycore SoCs, 2012

2010

OMPCUDA : OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler.

[BibT_eX]

[DOI]