Daichi Mukunoki
Orcid: 0000-0002-0051-6811
According to our database1,
Daichi Mukunoki authored at least 38 papers
between 2010 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
Layer-wise MoE Routing Locality under Shared-Prefix Code Generation: Token-Identity Decomposition and Compile-Equivalent Fork Redundancy.
CoRR, April, 2026
Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards.
CoRR, February, 2026
Learning-Augmented Performance Model for Tensor Product Factorization in High-Order FEM.
IEEE Access, 2026
Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops, 2026
Evaluating Claude Code's Coding and Test Automation for GPU Acceleration ofa Legacy Fortran Application: A GeoFEM Case Study.
Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops, 2026
2025
3Dify: a Framework for Procedural 3D-CG Generation Assisted by LLMs Using MCP and RAG.
CoRR, October, 2025
VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs.
CoRR, October, 2025
DGEMM without FP64 Arithmetic - Using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme.
CoRR, August, 2025
Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach.
CoRR, July, 2025
Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation.
CoRR, July, 2025
Sparse Iterative Solvers Using High-Precision Arithmetic with Quasi Multi-Word Algorithms.
Proceedings of the 18th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2025
Performance Evaluation of Loop Body Splitting for Fast Modal Filtering in SCALE-DG on A64FX.
Proceedings of the 2025 International Conference on High Performance Computing in Asia-Pacific Region Workshops, 2025
Proceedings of the Thirteenth International Symposium on Computing and Networking, CANDAR 2025, 2025
2024
Performance evaluation and modelling of single-precision matrix multiplication on Cerebras CS-2.
Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024
Reduced-Precision and Reduced-Exponent Formats for Accelerating Adaptive Precision Sparse Matrix-Vector Product.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024
2023
Proceedings of the 16th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2023
2022
Infinite-Precision Inner Product and Sparse Matrix-Vector Multiplication Using Ozaki Scheme with Dot2 on Manycore Processors.
Proceedings of the Parallel Processing and Applied Mathematics, 2022
2021
Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs.
Proceedings of the 14th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2021
Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021
Proceedings of the Computational Science and Its Applications - ICCSA 2021, 2021
Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme.
Proceedings of the HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region, 2021
2020
Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs.
J. Comput. Appl. Math., 2020
White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing.
CoRR, 2020
Can We Avoid Rounding-Error Estimation in HPC Codes and Still Get Trustworthy Results?
Proceedings of the Software Verification - 12th International Conference, 2020
Proceedings of the High Performance Computing - 35th International Conference, 2020
2019
Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2019
Proceedings of the Parallel Computing: Technology Trends, 2019
2018
Proceedings of the Computational Science - ICCS 2018, 2018
2017
Proceedings of the Parallel Processing and Applied Mathematics, 2017
Design Towards Modern High Performance Numerical LA Library Enabling Heterogeneity and Flexible Data Formats.
Proceedings of the Parallel Computing is Everywhere, 2017
2016
Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2016
Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016
2015
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015
2013
Proceedings of the Parallel Processing and Applied Mathematics, 2013
Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs.
Proceedings of the Computational Science and Its Applications - ICCSA 2013, 2013
2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
2010
Proceedings of the Applied Parallel and Scientific Computing, 2010