Ichitaro Yamazaki

Orcid: 0000-0002-6196-2508

According to our database1, Ichitaro Yamazaki authored at least 80 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Two-Stage Block Orthogonalization to Improve Performance of s-step GMRES.
CoRR, 2024

2023
Analysis of Randomized Householder-Cholesky QR Factorization with Multisketching.
CoRR, 2023

An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022
Low-synch Gram-Schmidt with delayed reorthogonalization for Krylov solvers.
Parallel Comput., 2022

Mixed precision s-step Lanczos and conjugate gradient algorithms.
Numer. Linear Algebra Appl., 2022

High-Performance GMRES Multi-Precision Benchmark: Design, Performance, and Challenges.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

QR Factorization of Block Low-Rank Matrices on Multi-instance GPU.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022

Mixed Precision $s$-step Conjugate Gradient with Residual Replacement on GPUs.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021
A Study of Mixed Precision Strategies for GMRES on GPUs.
CoRR, 2021

Two-Stage Gauss-Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU cluster.
CoRR, 2021

Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels.
CoRR, 2021

Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems.
IEEE Access, 2021

Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020
A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic.
CoRR, 2020

Reducing the amount of out-of-core data access for GPU-accelerated randomized SVD.
Concurr. Comput. Pract. Exp., 2020

Low-synchronization orthogonalization schemes for <i>s</i>-step and pipelined Krylov solvers in Trilinos.
Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing, 2020

Performance Portable Supernode-based Sparse Triangular Solver for Manycore Architectures.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

2019
PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP.
ACM Trans. Math. Softw., 2019

Performance of asynchronous optimized Schwarz with one-sided communication.
Parallel Comput., 2019

QR Factorization of Block Low-rank Matrices with Weak Admissibility Condition.
J. Inf. Process., 2019

Distributed-memory lattice H-matrix factorization.
Int. J. High Perform. Comput. Appl., 2019

Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization.
Proceedings of the 2019 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI, 2019

Optimization of Numerous Small Dense-Matrix-Vector Multiplications in H-Matrix Arithmetic on GPU.
Proceedings of the 13th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2019

Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators.
Proceedings of the Euro-Par 2019: Parallel Processing, 2019

2018
Symmetric Indefinite Linear Solver Using OpenMP Task on Multicore Architectures.
IEEE Trans. Parallel Distributed Syst., 2018

Autotuning Techniques for Performance-Portable Point Set Registration in 3D.
Supercomput. Front. Innov., 2018

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale.
SIAM Rev., 2018

Optimization of Hierarchical Matrix Computation on GPU.
Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018

Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

2017
Design and Implementation of the PULSAR Programming System for Large Scale Computing.
Supercomput. Front. Innov., 2017

Structure-Aware Linear Solver for Realtime Convex Optimization for Embedded Systems.
IEEE Embed. Syst. Lett., 2017

With Extreme Computing, the Rules Have Changed.
Comput. Sci. Eng., 2017

Non-GPU-resident symmetric indefinite factorization.
Concurr. Comput. Pract. Exp., 2017

Solving dense symmetric indefinite systems using GPUs.
Concurr. Comput. Pract. Exp., 2017

Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Towards numerical benchmark for half-precision floating point arithmetic.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Sampling algorithms to update truncated SVD.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Scaling point set registration in 3D across thread counts on multicore and hardware accelerator platforms through autotuning for large scale analysis of scientific point clouds.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Bringing High Performance Computing to Big Data Algorithms.
Proceedings of the Handbook of Big Data Technologies, 2017

2016
Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU.
ACM Trans. Math. Softw., 2016

Linear algebra software for large-scale accelerated multicore computing.
Acta Numer., 2016

Heterogeneous Streaming.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015
Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems.
Supercomput. Front. Innov., 2015

Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations.
Sci. Program., 2015

Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs.
SIAM J. Sci. Comput., 2015

A survey of recent developments in parallel implementations of Gaussian elimination.
Concurr. Comput. Pract. Exp., 2015

Mixed-precision block gram Schmidt orthogonalization.
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015

Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster.
Proceedings of the International Conference for High Performance Computing, 2015

Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs.
Proceedings of the International Conference for High Performance Computing, 2015

Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

2014
Communication-Avoiding Symmetric-Indefinite Factorization.
SIAM J. Matrix Anal. Appl., 2014

Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime.
Parallel Process. Lett., 2014

Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems.
Concurr. Comput. Pract. Exp., 2014

Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Deflation strategies to improve the convergence of communication-avoiding GMRES.
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster.
Proceedings of the International Conference for High Performance Computing, 2014

Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors.
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Optimizing Krylov Subspace Solvers on Graphics Processing Units.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Access-averse framework for computing low-rank matrix approximations.
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

Accelerating Numerical Dense Linear Algebra Calculations with GPUs.
Proceedings of the Numerical Computations with GPUs, 2014

2013
Performance comparison of parallel eigensolvers based on a contour integral method and a Lanczos method.
Parallel Comput., 2013

On Partitioning and Reordering Problems in a Hierarchically Parallel Hybrid Linear Solver.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Virtual Systolic Array for QR Decomposition.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012
dqds with Aggressive Early Deflation.
SIAM J. Matrix Anal. Appl., 2012

One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators.
Proceedings of the International Conference on Computational Science, 2012

A hybrid Hermitian general eigenvalue solver
CoRR, 2012



New Scheduling Strategies and Hybrid Programming for a Parallel Right-looking Sparse LU Factorization Algorithm on Multicore Cluster Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011
A Communication-Avoiding Thick-Restart Lanczos Method on a Distributed-Memory System.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

2010
Segmenting point-sampled surfaces.
Vis. Comput., 2010

Adaptive Projection Subspace Dimension for the Thick-Restart Lanczos Method.
ACM Trans. Math. Softw., 2010

On Techniques to Improve Robustness and Scalability of a Parallel Hybrid Linear Solver.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

2008
CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads.
Proceedings of the Research in Computational Molecular Biology, 2008

2006
Segmenting Point Sets.
Proceedings of the 2006 International Conference on Shape Modeling and Applications (SMI 2006), 2006


  Loading...