Fumihiko Ino

Orcid: 0000-0002-5757-7631

According to our database1, Fumihiko Ino authored at least 90 papers between 2001 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
A compression-based memory-efficient optimization for out-of-core GPU stencil computation.
J. Supercomput., July, 2023

A Synergy between On- and Off-Chip Data Reuse for GPU-based Out-of-Core Stencil Computation.
CoRR, 2023

Parallel Heuristic Methods to Accelerate Best Equivocation Code Generation.
IEEE Access, 2023

PRF: A Fast Parallel Relaxed Flooding Algorithm for Voronoi Diagram Generation on GPU.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022
A Reorder Trick for Decision Diagram Based Quantum Circuit Simulation.
CoRR, 2022

Compression-Based Optimizations for Out-of-Core GPU Stencil Computation.
CoRR, 2022

Accelerating Imbalanced Many-to-Many Communication with Systematic Delay Insertion.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022

A One-Shot Reparameterization Method for Reducing the Loss of Tile Pruning on DNNs.
Proceedings of the International Joint Conference on Neural Networks, 2022

2021
Cache-aware volume rendering methods with dynamic data reorganization.
J. Vis., 2021

Accelerating In-Transit Co-Processing for Scientific Simulations Using Region-Based Data-Driven Analysis.
Algorithms, 2021

Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2021

Accelerating a Lossy Compression Method with Fine-Grained Parallelism on a GPU.
Proceedings of the 12th International Symposium on Parallel Architectures, 2021

2020
A Data-Centric Directive-Based Framework to Accelerate Out-of-Core Stencil Computation on a GPU.
IEICE Trans. Inf. Syst., 2020

Block Randomized Singular Value Decomposition on GPUs.
IEICE Trans. Inf. Syst., 2020

Reducing the amount of out-of-core data access for GPU-accelerated randomized SVD.
Concurr. Comput. Pract. Exp., 2020

Accelerating Human Genome Phenotypic Analysis with Bitwise Search and Batched Computation.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020

Training Strategies for CNN-based Models to Parse Complex Floor Plans.
Proceedings of the 9th International Conference on Software and Computer Applications, 2020

2019
PACC: a directive-based programming framework for out-of-core stencil computation on accelerators.
Int. J. High Perform. Comput. Netw., 2019

Memory Efficient Load Balancing for Distributed Large-Scale Volume Rendering Using a Two-Layered Group Structure.
IEICE Trans. Inf. Syst., 2019

Accelerating the Held-Karp Algorithm for the Symmetric Traveling Salesman Problem.
IEICE Trans. Inf. Syst., 2019

GPU-based branch-and-bound method to solve large 0-1 knapsack problems with data-centric strategies.
Concurr. Comput. Pract. Exp., 2019

Transparent In-memory Cache Management in Apache Spark based on Post-Mortem Analysis.
Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019

2018
Transparent Avoidance of Redundant Data Transfer on GPU-enabled Apache Spark.
Proceedings of the 11th Workshop on General Purpose Processing using GPUs, 2018

A Method for Estimating Task Granularity for Automating GPU Cycle Sharing.
Proceedings of the VII International Conference on Network, Communication and Computing, 2018

An Automated Method for Generating Training Sets for Deep Learning based Image Registration.
Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018), 2018

2017
Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU.
IEEE Trans. Parallel Distributed Syst., 2017

Cache-Aware, In-Place Rotation Method for Texture-Based Volume Rendering.
IEICE Trans. Inf. Syst., 2017

High-Performance Out-of-core Block Randomized Singular Value Decomposition on GPU.
CoRR, 2017

An Out-of-Core Branch and Bound Method for Solving the 0-1 Knapsack Problem on a GPU.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2017

Accelerating scoring computation of Smith-Waterman algorithm with mixed word length.
Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine, 2017

2016
Reducing memory usage by the lifting-based discrete wavelet transform with a unified buffer on a GPU.
J. Parallel Distributed Comput., 2016

Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes.
IEICE Trans. Inf. Syst., 2016

An Extension of OpenACC Directives for Out-of-Core Stencil Computation with Temporal Blocking.
Proceedings of the Third Workshop on Accelerator Programming Using Directives, 2016

An OpenACC Optimizer for Accelerating Histogram Computation on a GPU.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Towards Automating Multi-dimensional Data Decomposition for Executing a Single-GPU Code on a Multi-GPU System.
Proceedings of the Fourth International Symposium on Computing and Networking, 2016

2015
A bit-parallel algorithm for searching multiple patterns with various lengths.
J. Parallel Distributed Comput., 2015

Enumerating Joint Weight of a Binary Linear Code Using Parallel Architectures: multi-core CPUs and GPUs.
Int. J. Netw. Comput., 2015

Accelerating the Smith-Waterman algorithm with interpair pruning and band optimization for the all-pairs comparison of base sequences.
BMC Bioinform., 2015

2014
Efficient Acceleration of Mutual Information Computation for Nonrigid Registration Using CUDA.
IEEE J. Biomed. Health Informatics, 2014

Improving cache locality for GPU-based volume rendering.
Parallel Comput., 2014

A Fine Grained Cycle Sharing System with Cooperative Multitasking on GPUs.
Int. J. Netw. Comput., 2014

A parallel scheme for accelerating parameter sweep applications on a GPU.
Concurr. Comput. Pract. Exp., 2014

A Parallel Algorithm for Enumerating Joint Weight of a Binary Linear Code in Network Coding.
Proceedings of the Second International Symposium on Computing and Networking, 2014

2013
GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems.
IEICE Trans. Inf. Syst., 2013

The Past, Present, and Future of GPU-Accelerated Grid Computing.
Proceedings of the First International Symposium on Computing and Networking, 2013

2012
Sequence Homology Search Using Fine Grained Cycle Sharing of Idle GPUs.
IEEE Trans. Parallel Distributed Syst., 2012

A task parallel algorithm for finding all-pairs shortest paths using the GPU.
Int. J. High Perform. Comput. Netw., 2012

Cooperative multitasking for GPU-accelerated grid systems.
Concurr. Comput. Pract. Exp., 2012

Acceleration of variance of color differences-based demosaicing using CUDA.
Proceedings of the 2012 International Conference on High Performance Computing & Simulation, 2012

Improving Cache Locality for Ray Casting with CUDA.
Proceedings of the ARCS 2012 Workshops, 28. Februar - 2. März 2012, München, Germany, 2012

2011
Accelerating Parameter Sweep Applications Using CUDA.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011

2010
High-performance cone beam reconstruction using CUDA compatible GPUs.
Parallel Comput., 2010

A middleware for efficient stream processing in CUDA.
Comput. Sci. Res. Dev., 2010

Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs.
IEICE Trans. Inf. Syst., 2010

Out-of-core cone beam reconstruction using multiple GPUS.
Proceedings of the 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2010

2009
Harnessing the Power of Idle GPUs for Acceleration of Biological Sequence Alignment.
Parallel Process. Lett., 2009

Harnessing the power of idle GPUs for acceleration of biological sequence alignment.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008
A Resource Selection System for Cycle Stealing in GPU Grids.
J. Grid Comput., 2008

A decompression pipeline for accelerating out-of-core volume rendering of time-varying data.
Comput. Graph., 2008

A Task Parallel Algorithm for Computing the Costs of All-Pairs Shortest Paths on the CUDA-Compatible GPU.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2008

Accelerating Cone Beam Reconstruction Using the CUDA-Enabled GPU.
Proceedings of the High Performance Computing, 2008

Design and implementation of the Smith-Waterman algorithm on the CUDA-compatible GPU.
Proceedings of the 8th IEEE International Conference on Bioinformatics and Bioengineering, 2008

2007
Parallel Adaptive Estimation of Hip Range of Motion for Total Hip Replacement Surgery.
IEICE Trans. Inf. Syst., 2007

Real-time rendering of time-varying volume data using a single cots computer.
Proceedings of the GRAPP 2007, 2007

2006
Trace reduction for performance improvement assessment of message passing parallel programs.
Syst. Comput. Jpn., 2006

A parallel implementation of 2-D/3-D image registration for computer-assisted surgery.
Int. J. Bioinform. Res. Appl., 2006

Grid Resource Monitoring and Selection for Rapid Turnaround Applications.
IEICE Trans. Inf. Syst., 2006

A Resource Selection Method for Cycle Stealing in the GPU Grid.
Proceedings of the Frontiers of High Performance Computing and Networking, 2006

A GPGPU Approach for Accelerating 2-D/3-D Rigid Registration of Medical Images.
Proceedings of the Parallel and Distributed Processing and Applications, 2006

Minimizing Data Size for Efficient Data Reuse in Grid-Enabled Medical Applications.
Proceedings of the Biological and Medical Data Analysis, 7th International Symposium, 2006

A code motion technique for accelerating general-purpose computation on the GPU.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Two-stage compression for fast volume rendering of time-varying scalar data.
Proceedings of the 4th International Conference on Computer Graphics and Interactive Techniques in Australasia and Southeast Asia 2006, Kuala Lumpur, Malaysia, November 29, 2006

2005
A data distributed parallel algorithm for nonrigid image registration.
Parallel Comput., 2005

Performance Study of Nonrigid Registration Algorithm for Investigating Lung Disease on Clusters.
Proceedings of the Sixth International Conference on Parallel and Distributed Computing, 2005

Performance Study of LU Decomposition on the Programmable GPU.
Proceedings of the High Performance Computing, 2005

2004
High-performance computing service over the Internet for intraoperative image processing.
IEEE Trans. Inf. Technol. Biomed., 2004

Evaluation of Performance Prediction Method for Master/Slave Parallel Programs.
IEICE Trans. Inf. Syst., 2004

PerWiz: A What-If Prediction Tool for Tuning Message Passing Programs.
Proceedings of the High Performance Computing for Computational Science, 2004

Real-Time Estimation of Hip Range of Motion for Total Hip Replacement Surgery.
Proceedings of the Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2004, 2004

Parallel Volume Rendering with Early Ray Termination for Visualizing Large-Scale Datasets.
Proceedings of the Parallel and Distributed Processing and Applications, 2004

A Performance Analysis Tool for Performance Debugging of Message Passing Parallel Programs.
Proceedings of the 33rd International Conference on Parallel Processing Workshops (ICPP 2004 Workshops), 2004

2003
An improved binary-swap compositing for sort-last parallel rendering on distributed memory multiprocessors.
Parallel Comput., 2003

Debugging Tool for Localizing Faulty Processes in Message Passing Programs
CoRR, 2003

An Improvement on Binary-Swap Compositing for Sort-Last Parallel Rendering.
Proceedings of the 2003 ACM Symposium on Applied Computing (SAC), 2003

Design and Implementation of Parallel Nonrigid Image Registration Using Off-the-Shelf Supercomputers.
Proceedings of the Medical Image Computing and Computer-Assisted Intervention, 2003

A Divided-Screenwise Hierarchical Compositing for Sort-Last Parallel Volume Rendering.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

A High Performance Computing System for Medical Imaging in the Remote Operating Room.
Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003

An Emulation System for Predicting Master/Slave Program Performance.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

A high-performance computing service over the Internet for nonrigid image registration.
Proceedings of the CARS 2003. Computer Assisted Radiology and Surgery. Proceedings of the 17th International Congress and Exhibition, 2003

2001
LogGPS: a parallel computational model for synchronization analysis.
Proceedings of the 2001 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'01), 2001


  Loading...