Stanislav G. Sedukhin

Yoichi Tomioka

Proceedings of the Computational Science and Its Applications - ICCSA 2022, 2022

2021

In Search of the Performance- and Energy-Efficient CNN Accelerators.

[BibT_eX]

[DOI]

Yoichi Tomioka

Kohei Yamamoto

Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2021

2019

Brain-inspired Co-design of Algorithm/Architecture for CNN Accelerators.

[BibT_eX]

[DOI]

Yoichi Tomioka

Proceedings of the 8th International Congress on Advanced Applied Informatics, 2019

2018

Evaluations of OpenCL-written tsunami simulation on FPGA and comparison with GPU implementation.

[BibT_eX]

[DOI]

J. Supercomput., 2018

2017

Performance Evaluation of Tsunami Simulation Using OpenCL on GPU and FPGA.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2017

2016

Parallelism for High-Performance Tsunami Simulation with FPGA: Spatial or Temporal?

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

2015

Stream Computation of Shallow Water Equation Solver for FPGA-based 1D Tsunami Simulation.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2015

2014

Generalized Matrix Multiplication and its Object Oriented Model.

[BibT_eX]

[DOI]

Scalable Comput. Pract. Exp., 2014

Image scrambling on a "mesh-of-tori" architecture.

[BibT_eX]

[DOI]

Scalable Comput. Pract. Exp., 2014

Performance analysis of scalable algorithms for 3D linear transforms.

[BibT_eX]

[DOI]

Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, 2014

2013

Library for Matrix Multiplication-based Data Manipulation on a "Mesh-of-Tori" Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2013 Federated Conference on Computer Science and Information Systems, 2013

2012

Blocked United Algorithm for the All-Pairs Shortest Paths Problem on Hybrid CPU-GPU Systems.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2012

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU.

[BibT_eX]

[DOI]

Proceedings of the IEEE 6th International Symposium on Embedded Multicore/Manycore SoCs, 2012

2011

Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2011

3D-DCT Processor and Its FPGA Implementation.

[BibT_eX]

[DOI]

Yuki Ikegaki

IEICE Trans. Inf. Syst., 2011

Generalizing Matrix Multiplication for Efficient Computations on Modern Computers.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2011

An O(n) Time-Complexity Matrix Transpose on Torus Array Processor.

[BibT_eX]

[DOI]

Abhijeet A. Ravankar

Proceedings of the Second International Conference on Networking and Computing, 2011

Blocked All-Pairs Shortest Paths Algorithm for Hybrid CPU-GPU System.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Object Oriented Model of Generalized Matrix Multipication.

[BibT_eX]

[DOI]

Proceedings of the Federated Conference on Computer Science and Information Systems, 2011

2010

Orbital Systolic Algorithms and Array Processors for Solution of the Algebraic Path Problem.

[BibT_eX]

[DOI]

Kenichi Kuroda

IEICE Trans. Inf. Syst., 2010

Orbital Algorithms and Unified Array Processor for Computing 2D Separable Transforms.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Parallel Processing, 2010

Mesh-of-Tori: A Novel Interconnection Network for Frontal Plane Cellular Processors.

[BibT_eX]

[DOI]

Abhijeet A. Ravankar

Proceedings of the First International Conference on Networking and Computing, 2010

Matrix Multiply-Add in Min-plus Algebra on a Short-Vector SIMD Processor of Cell/B.E..

[BibT_eX]

[DOI]

Proceedings of the First International Conference on Networking and Computing, 2010

Rapid*Closure: Algebraic Extensions of a Scalar Multiply-add Operation.

[BibT_eX]

Proceedings of the ISCA 25th International Conference on Computers and Their Applications, 2010

2009

A Solution of the All-Pairs Shortest Paths Problem on the Cell Broadband Engine Processor.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2009

Matrix Inversion on the Cell/B.E. Processor.

[BibT_eX]

[DOI]

Shodai Yokoyama

Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

2008

Level-3 BLAS and LU Factorization on a Matrix Processor.

[BibT_eX]

[DOI]

Inf. Media Technol., 2008

2-D Separable Transforms on a Matrix Processor.

[BibT_eX]

Proceedings of the ISCA 21st International Conference on Computer Applications in Industry and Engineering, 2008

Array processor featuring an effective FIFO-based data stream management.

[BibT_eX]

[DOI]

Yusuke Nomoto

Yuka Sato

Proceedings of 8th IEEE International Conference on Computer and Information Technology, 2008

2007

Performance Evaluation of Basic Linear Algebra Subroutines on a Matrix Co-processor.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2007

Fine-grained Matrix Multiply-Add on a Torus Array Processor.

[BibT_eX]

Proceedings of the 22nd International Conference on Computers and Their Applications, 2007

Evaluating the Performance of Basic Linear Algebra Subroutines on a Torus Array Processor.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Conference on Computer and Information Technology (CIT 2007), 2007

2006

The general matrix multiply-add operation on 2D torus.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Matrix Transpose on 2D Torus Array Processor.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Computer and Information Technology (CIT 2006), 2006

2005

Performance Evaluation of Blas on the Trident Processor.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2005

Computationally Efficient Parallel Matrix-Matrix Multiplication on the Torus.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Parallel Blocked Algorithm for Solving the Algebraic Path Problem on a Matrix Processor.

[BibT_eX]

[DOI]

Akihito Takahashi

Proceedings of the High Performance Computing and Communications, 2005

A Matrix Processor for Math-intensive Applications.

[BibT_eX]

Proceedings of the ISCA 18th International Conference on Computer Applications in Industry and Engineering, 2005

2003

Matrix Bidiagonalization: Implementation and Evaluation on the Trident Processor.

[BibT_eX]

[DOI]

Neural Parallel Sci. Comput., 2003

Parallel LU-decomposition on Pentium Streaming SIMD Extensions.

[BibT_eX]

[DOI]

Akihito Takahashi

Proceedings of the High Performance Computing, 5th International Symposium, 2003

Matrix Bidiagonalization on the Trident Processor.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Trident: Technology-Scalable Architecture for Data Parallel Application.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

BLAS on the Trident Processor: Implementation and Performance Evaluation.

[BibT_eX]

Proceedings of the ISCA 18th International Conference Computers and Their Applications, 2003

2002

A Multi-level ISA Processor for Accelerating Data Parallel Applications.

[BibT_eX]

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

Performance Analysis of SVD Algorithm on the Trident Processor.

[BibT_eX]

[DOI]

Proceedings of the 1st International Symposium on Cyber Worlds (CW 2002), 2002

2001

Pattern Dependent Reconstruction of Raster Digital Elevation Models from Contour Maps.

[BibT_eX]

Vladimir V. Savchenko

Proceedings of the IASTED International Conference on Visualization, 2001

2000

Performance Evaluation of the Clustered Web Server.

[BibT_eX]

T. Takigahira

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2000

Design of Multi-dimensional DCT Array Processors for Video Applications.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999

Design Of I/O Efficient, Scalable Array Processors for Multi-dimensional DFT.

[BibT_eX]

Hiroshi Nagata

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

A new scalable array processor for two-dimensional discrete Fourier transform.

[BibT_eX]

Hiroshi Nagata

Proceedings of the Parallel Computing: Fundamentals & Applications, 1999

1997

An Interactive Graphic Tool for Systematic Design and Analysis of VLSI Array Processors.

[BibT_eX]

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1997

1996

Array Processors Design for Division-free Linear System Solving.

[BibT_eX]

[DOI]

Comput. J., 1996

Parallel Rendering with the Network Linda System.

[BibT_eX]

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1996

Parallel Algorithm And Architecture For Two-Step Division-Free Gaussian Elimination.

[BibT_eX]

[DOI]

Proceedings of the 1996 International Conference on Application-Specific Systems, 1996

1995

An Algorithm and Array Processor for Solving the Systems of Linear Equations.

[BibT_eX]

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1995

1994

A new systolic architecture for pipeline prime factor DFT-algorithm.

[BibT_eX]

[DOI]

Proceedings of the Fourth Great Lakes Symposium on Design Automation of High Performance VLSI Systems, 1994

Systematic Approach and Software Tool for Systolic Design.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing: CONPAR 94, 1994

1990

Systolic Array Architecture for Two-Dimensional Discrete Fourier Transform.

[BibT_eX]

[DOI]