Yasuaki Ito

Yoshiho Oda

Takayoshi Narita

Hideaki Kato

Proceedings of the 12th Asian Control Conference, 2019

2018

Almost optimal column-wise prefix-sum computation on the GPU.

[BibT_eX]

[DOI]

J. Supercomput., 2018

An Optimal Parallel Algorithm for Computing the Summed Area Table on the GPU.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Efficient Byte Stream Pattern Test using Bloom Filter with Rolling Hash Functions on the FPGA.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on Computing and Networking, 2018

A Prefix-Sum-Based Rabin-Karp Implementation for Multiple Pattern Matching on GPGPU.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on Computing and Networking, 2018

Tile Art Image Generation Using Conditional Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on Computing and Networking, 2018

2017

An Efficient GPU Implementation of Bulk Computation of the Eigenvalue Problem for Many Small Real Non-symmetric Matrices.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2017

GPU-accelerated Exhaustive Verification of the Collatz Conjecture.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2017

An Efficient GPU Implementation of CKY Parsing Using the Bitwise Parallel Bulk Computation Technique.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2017

C2CU: a CUDA C program generator for bulk execution of a sequential algorithm.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

Accelerating digital halftoning using the local exhaustive search on the GPU.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

Adaptive loss-less data compression method optimized for GPU decompression.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

A GPU Implementation of Bulk Execution of the Dynamic Programming for the Optimal Polygon Triangulation.

[BibT_eX]

[DOI]

Kohei Yamashita

Proceedings of the Parallel Processing and Applied Mathematics, 2017

Almost Optimal Column-wise Prefix-sum Computation on the GPU.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2017

Photomosaic Generation by Rearranging Subimages, with GPU Acceleration.

[BibT_eX]

[DOI]

Yi Yang

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Accelerating the Smith-Waterman Algorithm Using Bitwise Parallel Bulk Computation Technique on GPU.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU Implementations.

[BibT_eX]

[DOI]

Proceedings of the 46th International Conference on Parallel Processing, 2017

A Hybrid Architecture for the Approximate String Matching on an FPGA.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Symposium on Computing and Networking, 2017

A Square Pointillism Image Generation, and Its GPU Acceleration.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Symposium on Computing and Networking, 2017

Single Kernel Soft Synchronization Technique for Task Arrays on CUDA-enabled GPUs, with Applications.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Symposium on Computing and Networking, 2017

2016

A character art generator using the local exhaustive search, with GPU acceleration.

[BibT_eX]

[DOI]

Int. J. Parallel Emergent Distributed Syst., 2016

Efficient Implementation of FDFM Approach for Euclidean Algorithms on the FPGA.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2016

Bulk execution of Euclidean algorithms on the CUDA-enabled GPU.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2016

Fast Simulation of Conway's Game of Life Using Bitwise Parallel Bulk Computation on a GPU.

[BibT_eX]

[DOI]

Int. J. Found. Comput. Sci., 2016

A Memory-Access-Efficient Implementation for Computing the Approximate String Matching Algorithm on GPUs.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2016

An FPGA Implementation for a Flexible-Length-Arithmetic Processor Employing the FDFM Processor Core Approach.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2016

GPU-Accelerated Bulk Execution of Multiple-Length Multiplication with Warp-Synchronous Programming Technique.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2016

Fully Parallelized LZW Decompression for CUDA-Enabled GPUs.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2016

An Efficient Implementation of LZW Decompression in the FPGA.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Bitwise Parallel Bulk Computation on the GPU, with Application to the CKY Parsing for Context-Free Grammars.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

An Efficient Implementation of LZW Compression in the FPGA.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

Light Loss-Less Data Compression, with GPU Implementation.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

GPU-Accelerated Bulk Computation of the Eigenvalue Problem for Many Small Real Non-symmetric Matrices.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Symposium on Computing and Networking, 2016

A Memory-Access-Efficient Implementation of the Approximate String Matching Algorithm on GPU.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Symposium on Computing and Networking, 2016

An Evaluation of the Parallella Architecture for the Convex Hull Computation.

[BibT_eX]

[DOI]

Keisuke Nakata

Proceedings of the Fourth International Symposium on Computing and Networking, 2016

Accelerating Ant Colony Optimization for the Vertex Coloring Problem on the GPU.

[BibT_eX]

[DOI]

Ryouhei Murooka

Proceedings of the Fourth International Symposium on Computing and Networking, 2016

A Hardware Sorter for Almost Sorted Sequences, with FPGA Implementations.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Symposium on Computing and Networking, 2016

2015

Parallel FDFM Approach for Computing GCDs Using the FPGA.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2015

A Parallel Algorithm for LZW Decompression, with GPU Implementation.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2015

Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU Implementation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Optimal Parallel Hardware K-Sorter and Top K-Sorter, with FPGA Implementations.

[BibT_eX]

[DOI]

Naoyuki Matsumoto

Proceedings of the 14th International Symposium on Parallel and Distributed Computing, 2015

GPU-Accelerated Digital Halftoning by the Local Exhaustive Search.

[BibT_eX]

[DOI]

Hiroaki Kouge

Proceedings of the 14th International Symposium on Parallel and Distributed Computing, 2015

Bulk GCD Computation Using a GPU to Break Weak RSA Keys.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

A Fast Approximate String Matching Algorithm on GPU.

[BibT_eX]

[DOI]

Proceedings of the Third International Symposium on Computing and Networking, 2015

A Flexible-Length-Arithmetic Processor Based on FDFM Approach in FPGAs.

[BibT_eX]

[DOI]

Tatsuya Kawamoto

Proceedings of the Third International Symposium on Computing and Networking, 2015

Parallelization Techniques for Error Diffusion with GPU Implementations.

[BibT_eX]

[DOI]

Proceedings of the Third International Symposium on Computing and Networking, 2015

A Warp-Synchronous Implementation for Multiple-Length Multiplication on the GPU.

[BibT_eX]

[DOI]

Proceedings of the Third International Symposium on Computing and Networking, 2015

Fast LZW Compression Using a GPU.

[BibT_eX]

[DOI]

Proceedings of the Third International Symposium on Computing and Networking, 2015

Efficient GPU Implementations for the Conway's Game of Life.

[BibT_eX]

[DOI]

Proceedings of the Third International Symposium on Computing and Networking, 2015

2014

Accelerating ant colony optimisation for the travelling salesman problem on the GPU.

[BibT_eX]

[DOI]

Akihiro Uchida

Int. J. Parallel Emergent Distributed Syst., 2014

Implementations of the Hough Transform on the Embedded Multicore Processors.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2014

An Optimal Implementation of the Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation on the GPU.

[BibT_eX]

[DOI]

Duhu Man

IEICE Trans. Inf. Syst., 2014

Offline Permutation on the CUDA-enabled GPU.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2014

An Efficient Implementation of the Gradient-Based Hough Transform Using DSP Slices and Block RAMs on the FPGA.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Random Address Permute-Shift Technique for the Shared Memory on GPUs.

[BibT_eX]

[DOI]

Susumu Matsumae

Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

C2CU : A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm.

[BibT_eX]

[DOI]

Daisuke Takafuji

Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

A GPU Implementation of Clipping-Free Halftoning Using the Direct Binary Search.

[BibT_eX]

[DOI]

Hiroaki Koge

Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

GPU-Accelerated Verification of the Collatz Conjecture.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

An Efficient Implementation of the One-Dimensional Hough Transform Algorithm for Circle Detection on the FPGA.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on Computing and Networking, 2014

Thorough Evaluation of GPU Shared Memory Load and Store Instructions.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on Computing and Networking, 2014

2013

Accelerating computation of Euclidean distance map using the GPU with efficient memory access.

[BibT_eX]

[DOI]

Int. J. Parallel Emergent Distributed Syst., 2013

An FPGA implementation for neural networks with the FDFM processor core approach.

[BibT_eX]

[DOI]

Yuki Ago

Int. J. Parallel Emergent Distributed Syst., 2013

Offline Permutation Algorithms on the Discrete Memory Machine with Performance Evaluation on the GPU.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2013

A GPU Implementation of Dynamic Programming for the Optimal Polygon Triangulation.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2013

Efficient Hough Transform on the FPGA using DSP Slices and Block RAMs.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU Implementation.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

ASCII Art Generation Using the Local Exhaustive Search on the GPU.

[BibT_eX]

[DOI]

Proceedings of the First International Symposium on Computing and Networking, 2013

The Random Address Shift to Reduce the Memory Access Congestion on the Discrete Memory Machine.

[BibT_eX]

[DOI]

Susumu Matsumae

Proceedings of the First International Symposium on Computing and Networking, 2013

TinyCSE: Tiny Computer System for Education.

[BibT_eX]

[DOI]

Ryosuke Nakamura

Proceedings of the First International Symposium on Computing and Networking, 2013

A Flexible-Length-Arithmetic Processor Using Embedded DSP Slices and Block RAMs in FPGAs.

[BibT_eX]

[DOI]

Kohan Sai

Proceedings of the First International Symposium on Computing and Networking, 2013

Template Matching Using DSP Slices on the FPGA.

[BibT_eX]

[DOI]

Kaoru Hashimoto

Proceedings of the First International Symposium on Computing and Networking, 2013

2012

A Rewriting Approach to Replace Asynchronous ROMs with Synchronous Ones for the Circuits with Cycles.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2012

The Parallel FDFM Processor Core Approach for CRT-based RSA Decryption.

[BibT_eX]

[DOI]

Bo Song

Int. J. Netw. Comput., 2012

Accelerating the Dynamic Programming for the Optimal Polygon Triangulation on the GPU.

[BibT_eX]

[DOI]

Kazufumi Nishida

Proceedings of the Algorithms and Architectures for Parallel Processing, 2012

An Efficient GPU Implementation of Ant Colony Optimization for the Traveling Salesman Problem.

[BibT_eX]

[DOI]

Akihiro Uchida

Edans Flavius de Oliveira Sandes

Proceedings of the Third International Conference on Networking and Computing, 2012

Record Route Elimination (RRE): An Energy-Efficient Broadcast Algorithm.

[BibT_eX]

[DOI]

Alba Cristina Magalhaes Alves de Melo

Proceedings of the Third International Conference on Networking and Computing, 2012

An Implementation of Conflict-Free Offline Permutation on the GPU.

[BibT_eX]

[DOI]

Proceedings of the Third International Conference on Networking and Computing, 2012

2011

Implementations of a Parallel Algorithm for Computing Euclidean Distance Map in Multicore Processors and GPUs.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2011

Efficient Exhaustive Verification of the Collatz Conjecture using DSP blocks of Xilinx FPGAs.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2011

Preface.

[BibT_eX]

[DOI]

Sayaka Kamei

Int. J. Netw. Comput., 2011

A Graph Rewriting Approach for Converting Asynchronous ROMs into Synchronous Ones.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2011

CRT-Based DSP Decryption Using Montgomery Modular Multiplication on the FPGA.

[BibT_eX]

[DOI]

Bo Song

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Fast and Accurate Template Matching Using Pixel Rearrangement on the GPU.

[BibT_eX]

[DOI]

Akihiro Uchida

Proceedings of the Second International Conference on Networking and Computing, 2011

Accelerating the Dynamic Programming for the Matrix Chain Product on the GPU.

[BibT_eX]

[DOI]

Kazufumi Nishida

Proceedings of the Second International Conference on Networking and Computing, 2011

An Algorithm to Remove Asynchronous ROMs in Circuits with Cycles.

[BibT_eX]

[DOI]

Proceedings of the Second International Conference on Networking and Computing, 2011

A GPU Implementation of Computing Euclidean Distance Map with Efficient Memory Access.

[BibT_eX]

[DOI]

Proceedings of the Second International Conference on Networking and Computing, 2011

Fast Ellipse Detection Algorithm Using Hough Transform on the GPU.

[BibT_eX]

[DOI]

Kohei Ogawa

Proceedings of the Second International Conference on Networking and Computing, 2011

The Parallel FDFM Processor Core Approach for Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Second International Conference on Networking and Computing, 2011

2010

Low-Latency Connected Component Labeling Using an FPGA.

[BibT_eX]

[DOI]

Int. J. Found. Comput. Sci., 2010

Efficient exhaustive verification of the Collatz conjecture using DSP48E blocks of Xilinx Virtex-5 FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

An RSA Encryption Hardware Algorithm Using a Single DSP Block and a Single Block RAM on the FPGA.

[BibT_eX]

[DOI]

Proceedings of the First International Conference on Networking and Computing, 2010

Efficient Canny Edge Detection Using a GPU.

[BibT_eX]

[DOI]

Kohei Ogawa

Proceedings of the First International Conference on Networking and Computing, 2010

A Rewriting Algorithm to Generate AROM-free Fully Synchronous Circuits.

[BibT_eX]

[DOI]

Proceedings of the First International Conference on Networking and Computing, 2010

Implementations of Parallel Computation of Euclidean Distance Map in Multicore Processors and GPUs.

[BibT_eX]

[DOI]

Proceedings of the First International Conference on Networking and Computing, 2010

2009

A Simple Parallel Convex Hulls Algorithm for Sorted Points and the Performance Evaluation on the Multicore Processors.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

An Efficient Parallel Sorting Compatible with the Standard qsort.

[BibT_eX]

[DOI]

Duhu Man

Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

A Hardware-Software Cooperative Approach for the Exhaustive Verification of the Collatz Conjecture.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

2008

A New FM Screening Method to Generate Cluster-Dot Binary Images Using the Local Exhaustive Search with FPGA Acceleration.

[BibT_eX]

[DOI]

Int. J. Found. Comput. Sci., 2008

Optimized Component Labeling Algorithm for Using in Medium Sized FPGAs.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008

Disturbance estimation by observer-based stabilizing controller with simplified design and its applications to teleoperation.

[BibT_eX]

[DOI]

Ryoichi Suzuki

Nobuaki Kobayashi

Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2008

Component labeling for k-concave binary images using an FPGA.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Processor, Assembler, and Compiler Design Education Using an FPGA.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Parallel and Distributed Systems, 2008

A Tiny Processing System for Education and Small Embedded Systems on the FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE/IPIP International Conference on Embedded and Ubiquitous Computing (EUC 2008), 2008

2007

Efficient Hardware Algorithms for n Choose k Counters Using the Bitonic Merger.

[BibT_eX]

[DOI]

Youhei Yamagishi

Int. J. Found. Comput. Sci., 2007

Cluster-dot Screening by Local Exhaustive Search with Hardware Accelaration.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

2006

An Energy Efficient Leader Election Protocol for Radio Network with a Single Transceiver.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2006

Randomized Leader Election Protocols in Noisy Radio Networks with a Single Transceiver.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Processing and Applications, 2006

Efficient hardware algorithms for n choose k counters.

[BibT_eX]

[DOI]

Youhei Yamagishi

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2004

Instance-Specific Solutions For Accelerating The Cky Parsing Of Large Context-Free Grammars.

[BibT_eX]

[DOI]

Int. J. Found. Comput. Sci., 2004

FM Screening by the Local Exhaustive Search, with Hardware Acceleration.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

2003

Instance-Specific Solutions to Accelerate the CKY Parsing.

[BibT_eX]

Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, June 23, 2003

2002

Accelerating the CKY Parsing Using FPGAs.

[BibT_eX]

[DOI]