Yasuaki Ito

Orcid: 0000-0003-0593-231X

According to our database1, Yasuaki Ito authored at least 158 papers between 2002 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Dual-Matrix Domain-Wall: A Novel Technique for Generating Permutations by QUBO and Ising Models with Quadratic Sizes.
CoRR, 2023

GPU implementations of deflate encoding and decoding.
Concurr. Comput. Pract. Exp., 2023

Efficient parallel implementations to compute the diameter of a graph.
Concurr. Comput. Pract. Exp., 2023

A novel structured sparse fully connected layer in convolutional neural networks.
Concurr. Comput. Pract. Exp., 2023

High-throughput FPGA implementation for quadratic unconstrained binary optimization.
Concurr. Comput. Pract. Exp., 2023

Simple iterative trial search for the maximum independent set problem optimized for the GPUs.
Concurr. Comput. Pract. Exp., 2023

Graphics processing unit-accelerated high-quality watercolor painting image generation.
Concurr. Comput. Pract. Exp., 2023

International Symposium on Computing and Networking (CANDAR 2019) special issue.
Concurr. Comput. Pract. Exp., 2023

Diverse Adaptive Bulk Search: a Framework for Solving QUBO Problems on Multiple GPUs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Solving the N-Queens Puzzle by a QUBO Model with Quadratic Size.
Proceedings of the Eleventh International Symposium on Computing and Networking, CANDAR 2023, Matsue, Japan, November 28, 2023

Efficient GPU-Accelerated Bulk Evaluation of the Boys Function for Quantum Chemistry.
Proceedings of the Eleventh International Symposium on Computing and Networking, CANDAR 2023, Matsue, Japan, November 28, 2023

2022
GPU-accelerated scalable solver with bit permutated cyclic-min algorithm for quadratic unconstrained binary optimization.
J. Parallel Distributed Comput., 2022

Graph-theoretic Formulation of QUBO for Scalable Local Search on GPUs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

The Bonsai Hypothesis: An Efficient Network Pruning Technique.
Proceedings of the Artificial Intelligence Applications and Innovations, 2022

BERT-Based Scientific Paper Quality Prediction.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2022, 2022

ConvUNeXt: A Lightweight Convolutional Neural Network for Watercolor Image Translation.
Proceedings of the 2022 Tenth International Symposium on Computing and Networking, CANDAR 2022, 2022

A benchmark QUBO problem inspired by digital halftoning based on the human visual system.
Proceedings of the Tenth International Symposium on Computing and Networking, 2022

Bit duplication technique to generate hard QUBO problems.
Proceedings of the 2022 Tenth International Symposium on Computing and Networking, CANDAR 2022, 2022

A Bokeh Image Generation Technique using Machine Learning.
Proceedings of the Tenth International Symposium on Computing and Networking, 2022

2021
Efficient implementations of Bloom filter using block RAMs and DSP slices on the FPGA.
Concurr. Comput. Pract. Exp., 2021

Tile art image generation using parallel greedy algorithm on the GPU and its approximation with machine learning.
Concurr. Comput. Pract. Exp., 2021

On the Computational Power of Convolution Pooling: A Theoretical Approach for Deep Learning.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Acceleration of Deflate Encoding and Decoding with GPU implementations.
Proceedings of the Ninth International Symposium on Computing and Networking, 2021

Solving the sparse QUBO on multiple GPUs for Simulating a Quantum Annealer.
Proceedings of the Ninth International Symposium on Computing and Networking, 2021

A GPU Implementation of Watercolor Painting Image Generation.
Proceedings of the Ninth International Symposium on Computing and Networking, 2021

2020
Efficient convolution pooling on the GPU.
J. Parallel Distributed Comput., 2020

A Rabin-Karp Implementation for Handling Multiple Pattern-Matching on the GPU.
IEICE Trans. Inf. Syst., 2020

A Work-Time Optimal Parallel Exhaustive Search Algorithm for the QUBO and the Ising model, with GPU implementation.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

An Efficient Multicore CPU Implementation for Convolution-Pooling Computation in CNNs.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple GPUs.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Huffman Coding with Gap Arrays for GPU Acceleration.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Art Font Image Generation with Conditional Generative Adversarial Networks.
Proceedings of the Eighth International Symposium on Computing and Networking Workshops, 2020

Fully-Pipelined Architecture for Simulated Annealing-based QUBO Solver on the FPGA.
Proceedings of the Eighth International Symposium on Computing and Networking, 2020

Efficient GPU Implementation for Solving the Maximum Independent Set Problem.
Proceedings of the Eighth International Symposium on Computing and Networking, 2020

2019
Accelerating the Smith-Waterman Algorithm Using the Bitwise Parallel Bulk Computation Technique on the GPU.
IEICE Trans. Inf. Syst., 2019

Bulk execution of the dynamic programming for the optimal polygon triangulation problem on the GPU.
Concurr. Comput. Pract. Exp., 2019

Efficient cuDNN-Compatible Convolution-Pooling on the GPU.
Proceedings of the Parallel Processing and Applied Mathematics, 2019

Stained Glass Image Generation Using Voronoi Diagram and Its GPU Acceleration.
Proceedings of the Parallel Processing and Applied Mathematics, 2019

Efficient Triangular Matrix Vector Multiplication on the GPU.
Proceedings of the Parallel Processing and Applied Mathematics, 2019

FIFO-Based Hardware Sorters for High Bandwidth Memory.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

A Watercolor Painting Image Generation Using Stroke-Based Rendering.
Proceedings of the Seventh International Symposium on Computing and Networking Workshops, 2019

Efficient GPU Implementations to Compute the Diameter of a Graph.
Proceedings of the 2019 Seventh International Symposium on Computing and Networking, 2019

Structured Sparse Fully-Connected Layers in the CNNs and Its GPU Acceleration.
Proceedings of the Seventh International Symposium on Computing and Networking Workshops, 2019

Throughput-Optimal Hardware Implementation of LZW Decompression on the FPGA.
Proceedings of the Seventh International Symposium on Computing and Networking Workshops, 2019

Folded Bloom Filter for High Bandwidth Memory, with GPU Implementations.
Proceedings of the 2019 Seventh International Symposium on Computing and Networking, 2019

Electromagnetic Levitation System for Thin Steel Plate Using Electromagnets and Permanent Magnets (Fundamental Consideration on Optimal Placement to Suppress the Deflection of Steel Plate).
Proceedings of the 12th Asian Control Conference, 2019

2018
Almost optimal column-wise prefix-sum computation on the GPU.
J. Supercomput., 2018

An Optimal Parallel Algorithm for Computing the Summed Area Table on the GPU.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Efficient Byte Stream Pattern Test using Bloom Filter with Rolling Hash Functions on the FPGA.
Proceedings of the Sixth International Symposium on Computing and Networking, 2018

A Prefix-Sum-Based Rabin-Karp Implementation for Multiple Pattern Matching on GPGPU.
Proceedings of the Sixth International Symposium on Computing and Networking, 2018

Tile Art Image Generation Using Conditional Generative Adversarial Networks.
Proceedings of the Sixth International Symposium on Computing and Networking, 2018

2017
An Efficient GPU Implementation of Bulk Computation of the Eigenvalue Problem for Many Small Real Non-symmetric Matrices.
Int. J. Netw. Comput., 2017

GPU-accelerated Exhaustive Verification of the Collatz Conjecture.
Int. J. Netw. Comput., 2017

An Efficient GPU Implementation of CKY Parsing Using the Bitwise Parallel Bulk Computation Technique.
IEICE Trans. Inf. Syst., 2017

C2CU: a CUDA C program generator for bulk execution of a sequential algorithm.
Concurr. Comput. Pract. Exp., 2017

Accelerating digital halftoning using the local exhaustive search on the GPU.
Concurr. Comput. Pract. Exp., 2017

Adaptive loss-less data compression method optimized for GPU decompression.
Concurr. Comput. Pract. Exp., 2017

A GPU Implementation of Bulk Execution of the Dynamic Programming for the Optimal Polygon Triangulation.
Proceedings of the Parallel Processing and Applied Mathematics, 2017

Almost Optimal Column-wise Prefix-sum Computation on the GPU.
Proceedings of the Parallel Processing and Applied Mathematics, 2017

Photomosaic Generation by Rearranging Subimages, with GPU Acceleration.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU Implementations.
Proceedings of the 46th International Conference on Parallel Processing, 2017

A Hybrid Architecture for the Approximate String Matching on an FPGA.
Proceedings of the Fifth International Symposium on Computing and Networking, 2017

A Square Pointillism Image Generation, and Its GPU Acceleration.
Proceedings of the Fifth International Symposium on Computing and Networking, 2017

Single Kernel Soft Synchronization Technique for Task Arrays on CUDA-enabled GPUs, with Applications.
Proceedings of the Fifth International Symposium on Computing and Networking, 2017

2016
A character art generator using the local exhaustive search, with GPU acceleration.
Int. J. Parallel Emergent Distributed Syst., 2016

Efficient Implementation of FDFM Approach for Euclidean Algorithms on the FPGA.
Int. J. Netw. Comput., 2016

Bulk execution of Euclidean algorithms on the CUDA-enabled GPU.
Int. J. Netw. Comput., 2016

Fast Simulation of Conway's Game of Life Using Bitwise Parallel Bulk Computation on a GPU.
Int. J. Found. Comput. Sci., 2016

A Memory-Access-Efficient Implementation for Computing the Approximate String Matching Algorithm on GPUs.
IEICE Trans. Inf. Syst., 2016

An FPGA Implementation for a Flexible-Length-Arithmetic Processor Employing the FDFM Processor Core Approach.
IEICE Trans. Inf. Syst., 2016

GPU-Accelerated Bulk Execution of Multiple-Length Multiplication with Warp-Synchronous Programming Technique.
IEICE Trans. Inf. Syst., 2016

Fully Parallelized LZW Decompression for CUDA-Enabled GPUs.
IEICE Trans. Inf. Syst., 2016

An Efficient Implementation of LZW Decompression in the FPGA.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Bitwise Parallel Bulk Computation on the GPU, with Application to the CKY Parsing for Context-Free Grammars.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

An Efficient Implementation of LZW Compression in the FPGA.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

Light Loss-Less Data Compression, with GPU Implementation.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

GPU-Accelerated Bulk Computation of the Eigenvalue Problem for Many Small Real Non-symmetric Matrices.
Proceedings of the Fourth International Symposium on Computing and Networking, 2016

A Memory-Access-Efficient Implementation of the Approximate String Matching Algorithm on GPU.
Proceedings of the Fourth International Symposium on Computing and Networking, 2016

An Evaluation of the Parallella Architecture for the Convex Hull Computation.
Proceedings of the Fourth International Symposium on Computing and Networking, 2016

Accelerating Ant Colony Optimization for the Vertex Coloring Problem on the GPU.
Proceedings of the Fourth International Symposium on Computing and Networking, 2016

A Hardware Sorter for Almost Sorted Sequences, with FPGA Implementations.
Proceedings of the Fourth International Symposium on Computing and Networking, 2016

2015
Parallel FDFM Approach for Computing GCDs Using the FPGA.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

A Parallel Algorithm for LZW Decompression, with GPU Implementation.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU Implementation.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Optimal Parallel Hardware K-Sorter and Top K-Sorter, with FPGA Implementations.
Proceedings of the 14th International Symposium on Parallel and Distributed Computing, 2015

GPU-Accelerated Digital Halftoning by the Local Exhaustive Search.
Proceedings of the 14th International Symposium on Parallel and Distributed Computing, 2015

Bulk GCD Computation Using a GPU to Break Weak RSA Keys.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

A Fast Approximate String Matching Algorithm on GPU.
Proceedings of the Third International Symposium on Computing and Networking, 2015

A Flexible-Length-Arithmetic Processor Based on FDFM Approach in FPGAs.
Proceedings of the Third International Symposium on Computing and Networking, 2015

Parallelization Techniques for Error Diffusion with GPU Implementations.
Proceedings of the Third International Symposium on Computing and Networking, 2015

A Warp-Synchronous Implementation for Multiple-Length Multiplication on the GPU.
Proceedings of the Third International Symposium on Computing and Networking, 2015

Fast LZW Compression Using a GPU.
Proceedings of the Third International Symposium on Computing and Networking, 2015

Efficient GPU Implementations for the Conway's Game of Life.
Proceedings of the Third International Symposium on Computing and Networking, 2015

2014
Accelerating ant colony optimisation for the travelling salesman problem on the GPU.
Int. J. Parallel Emergent Distributed Syst., 2014

Implementations of the Hough Transform on the Embedded Multicore Processors.
Int. J. Netw. Comput., 2014

An Optimal Implementation of the Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation on the GPU.
IEICE Trans. Inf. Syst., 2014

Offline Permutation on the CUDA-enabled GPU.
IEICE Trans. Inf. Syst., 2014

An Efficient Implementation of the Gradient-Based Hough Transform Using DSP Slices and Block RAMs on the FPGA.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Random Address Permute-Shift Technique for the Shared Memory on GPUs.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

C2CU : A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

A GPU Implementation of Clipping-Free Halftoning Using the Direct Binary Search.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

GPU-Accelerated Verification of the Collatz Conjecture.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

An Efficient Implementation of the One-Dimensional Hough Transform Algorithm for Circle Detection on the FPGA.
Proceedings of the Second International Symposium on Computing and Networking, 2014

Thorough Evaluation of GPU Shared Memory Load and Store Instructions.
Proceedings of the Second International Symposium on Computing and Networking, 2014

2013
Accelerating computation of Euclidean distance map using the GPU with efficient memory access.
Int. J. Parallel Emergent Distributed Syst., 2013

An FPGA implementation for neural networks with the FDFM processor core approach.
Int. J. Parallel Emergent Distributed Syst., 2013

Offline Permutation Algorithms on the Discrete Memory Machine with Performance Evaluation on the GPU.
IEICE Trans. Inf. Syst., 2013

A GPU Implementation of Dynamic Programming for the Optimal Polygon Triangulation.
IEICE Trans. Inf. Syst., 2013

Efficient Hough Transform on the FPGA using DSP Slices and Block RAMs.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU Implementation.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

ASCII Art Generation Using the Local Exhaustive Search on the GPU.
Proceedings of the First International Symposium on Computing and Networking, 2013

The Random Address Shift to Reduce the Memory Access Congestion on the Discrete Memory Machine.
Proceedings of the First International Symposium on Computing and Networking, 2013

TinyCSE: Tiny Computer System for Education.
Proceedings of the First International Symposium on Computing and Networking, 2013

A Flexible-Length-Arithmetic Processor Using Embedded DSP Slices and Block RAMs in FPGAs.
Proceedings of the First International Symposium on Computing and Networking, 2013

Template Matching Using DSP Slices on the FPGA.
Proceedings of the First International Symposium on Computing and Networking, 2013

2012
A Rewriting Approach to Replace Asynchronous ROMs with Synchronous Ones for the Circuits with Cycles.
Int. J. Netw. Comput., 2012

The Parallel FDFM Processor Core Approach for CRT-based RSA Decryption.
Int. J. Netw. Comput., 2012

Accelerating the Dynamic Programming for the Optimal Polygon Triangulation on the GPU.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2012

An Efficient GPU Implementation of Ant Colony Optimization for the Traveling Salesman Problem.
Proceedings of the Third International Conference on Networking and Computing, 2012

Record Route Elimination (RRE): An Energy-Efficient Broadcast Algorithm.
Proceedings of the Third International Conference on Networking and Computing, 2012

An Implementation of Conflict-Free Offline Permutation on the GPU.
Proceedings of the Third International Conference on Networking and Computing, 2012

2011
Implementations of a Parallel Algorithm for Computing Euclidean Distance Map in Multicore Processors and GPUs.
Int. J. Netw. Comput., 2011

Efficient Exhaustive Verification of the Collatz Conjecture using DSP blocks of Xilinx FPGAs.
Int. J. Netw. Comput., 2011

Preface.
Int. J. Netw. Comput., 2011

An RSA Encryption Hardware Algorithm using a Single DSP Block and a Single Block RAM on the FPGA.
Int. J. Netw. Comput., 2011

An Efficient Parallel Sorting Compatible with the Standard Qsort.
Int. J. Found. Comput. Sci., 2011

A Graph Rewriting Approach for Converting Asynchronous ROMs into Synchronous Ones.
IEICE Trans. Inf. Syst., 2011

CRT-Based DSP Decryption Using Montgomery Modular Multiplication on the FPGA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Fast and Accurate Template Matching Using Pixel Rearrangement on the GPU.
Proceedings of the Second International Conference on Networking and Computing, 2011

Accelerating the Dynamic Programming for the Matrix Chain Product on the GPU.
Proceedings of the Second International Conference on Networking and Computing, 2011

An Algorithm to Remove Asynchronous ROMs in Circuits with Cycles.
Proceedings of the Second International Conference on Networking and Computing, 2011

A GPU Implementation of Computing Euclidean Distance Map with Efficient Memory Access.
Proceedings of the Second International Conference on Networking and Computing, 2011

Fast Ellipse Detection Algorithm Using Hough Transform on the GPU.
Proceedings of the Second International Conference on Networking and Computing, 2011

The Parallel FDFM Processor Core Approach for Neural Networks.
Proceedings of the Second International Conference on Networking and Computing, 2011

2010
Low-Latency Connected Component Labeling Using an FPGA.
Int. J. Found. Comput. Sci., 2010

Efficient exhaustive verification of the Collatz conjecture using DSP48E blocks of Xilinx Virtex-5 FPGAs.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Efficient Canny Edge Detection Using a GPU.
Proceedings of the First International Conference on Networking and Computing, 2010

A Rewriting Algorithm to Generate AROM-free Fully Synchronous Circuits.
Proceedings of the First International Conference on Networking and Computing, 2010

Implementations of Parallel Computation of Euclidean Distance Map in Multicore Processors and GPUs.
Proceedings of the First International Conference on Networking and Computing, 2010

2009
A Simple Parallel Convex Hulls Algorithm for Sorted Points and the Performance Evaluation on the Multicore Processors.
Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

A Hardware-Software Cooperative Approach for the Exhaustive Verification of the Collatz Conjecture.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

2008
A New FM Screening Method to Generate Cluster-Dot Binary Images Using the Local Exhaustive Search with FPGA Acceleration.
Int. J. Found. Comput. Sci., 2008

Optimized Component Labeling Algorithm for Using in Medium Sized FPGAs.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008

Disturbance estimation by observer-based stabilizing controller with simplified design and its applications to teleoperation.
Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2008

Component labeling for k-concave binary images using an FPGA.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Processor, Assembler, and Compiler Design Education Using an FPGA.
Proceedings of the 14th International Conference on Parallel and Distributed Systems, 2008

A Tiny Processing System for Education and Small Embedded Systems on the FPGAs.
Proceedings of the 2008 IEEE/IPIP International Conference on Embedded and Ubiquitous Computing (EUC 2008), 2008

2007
Efficient Hardware Algorithms for n Choose k Counters Using the Bitonic Merger.
Int. J. Found. Comput. Sci., 2007

Cluster-dot Screening by Local Exhaustive Search with Hardware Accelaration.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

2006
An Energy Efficient Leader Election Protocol for Radio Network with a Single Transceiver.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2006

Randomized Leader Election Protocols in Noisy Radio Networks with a Single Transceiver.
Proceedings of the Parallel and Distributed Processing and Applications, 2006

Efficient hardware algorithms for n choose k counters.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2005
FM Screening By The Local Exhaustive Search, With Hardware Acceleration.
Int. J. Found. Comput. Sci., 2005

2004
Instance-Specific Solutions For Accelerating The Cky Parsing Of Large Context-Free Grammars.
Int. J. Found. Comput. Sci., 2004

2003
Instance-Specific Solutions to Accelerate the CKY Parsing.
Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, June 23, 2003

2002
Accelerating the CKY Parsing Using FPGAs.
Proceedings of the High Performance Computing, 2002


  Loading...