Stanislav G. Sedukhin

Orcid: 0000-0002-0071-5140

According to our database1, Stanislav G. Sedukhin authored at least 58 papers between 1990 and 2022.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2022
In Search of the Performance- and Energy-Efficient CNN Accelerators.
IEICE Trans. Electron., 2022

High Performance Software Systolic Array Computing of Multi-channel Convolution on a GPU.
Proceedings of the Computational Science and Its Applications - ICCSA 2022, 2022

2019
Brain-inspired Co-design of Algorithm/Architecture for CNN Accelerators.
Proceedings of the 8th International Congress on Advanced Applied Informatics, 2019

2018
Evaluations of OpenCL-written tsunami simulation on FPGA and comparison with GPU implementation.
J. Supercomput., 2018

2017
Performance Evaluation of Tsunami Simulation Using OpenCL on GPU and FPGA.
Proceedings of the 11th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2017

2016
Parallelism for High-Performance Tsunami Simulation with FPGA: Spatial or Temporal?
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

2015
Stream Computation of Shallow Water Equation Solver for FPGA-based 1D Tsunami Simulation.
SIGARCH Comput. Archit. News, 2015

2014
Generalized Matrix Multiplication and its Object Oriented Model.
Scalable Comput. Pract. Exp., 2014

Image scrambling on a "mesh-of-tori" architecture.
Scalable Comput. Pract. Exp., 2014

Performance analysis of scalable algorithms for 3D linear transforms.
Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, 2014

2013
Library for Matrix Multiplication-based Data Manipulation on a "Mesh-of-Tori" Architecture.
Proceedings of the 2013 Federated Conference on Computer Science and Information Systems, 2013

2012
Blocked United Algorithm for the All-Pairs Shortest Paths Problem on Hybrid CPU-GPU Systems.
IEICE Trans. Inf. Syst., 2012

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU.
Proceedings of the IEEE 6th International Symposium on Embedded Multicore/Manycore SoCs, 2012

2011
Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems.
Proceedings of the International Conference on Computational Science, 2011

3D-DCT Processor and Its FPGA Implementation.
IEICE Trans. Inf. Syst., 2011

Generalizing Matrix Multiplication for Efficient Computations on Modern Computers.
Proceedings of the Parallel Processing and Applied Mathematics, 2011

An O(n) Time-Complexity Matrix Transpose on Torus Array Processor.
Proceedings of the Second International Conference on Networking and Computing, 2011

Blocked All-Pairs Shortest Paths Algorithm for Hybrid CPU-GPU System.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Object Oriented Model of Generalized Matrix Multipication.
Proceedings of the Federated Conference on Computer Science and Information Systems, 2011

2010
Orbital Systolic Algorithms and Array Processors for Solution of the Algebraic Path Problem.
IEICE Trans. Inf. Syst., 2010

Orbital Algorithms and Unified Array Processor for Computing 2D Separable Transforms.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Mesh-of-Tori: A Novel Interconnection Network for Frontal Plane Cellular Processors.
Proceedings of the First International Conference on Networking and Computing, 2010

Matrix Multiply-Add in Min-plus Algebra on a Short-Vector SIMD Processor of Cell/B.E..
Proceedings of the First International Conference on Networking and Computing, 2010

Rapid*Closure: Algebraic Extensions of a Scalar Multiply-add Operation.
Proceedings of the ISCA 25th International Conference on Computers and Their Applications, 2010

2009
A Solution of the All-Pairs Shortest Paths Problem on the Cell Broadband Engine Processor.
IEICE Trans. Inf. Syst., 2009

Matrix Inversion on the Cell/B.E. Processor.
Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

2008
2-D Separable Transforms on a Matrix Processor.
Proceedings of the ISCA 21st International Conference on Computer Applications in Industry and Engineering, 2008

Array processor featuring an effective FIFO-based data stream management.
Proceedings of 8th IEEE International Conference on Computer and Information Technology, 2008

2007
Performance Evaluation of Basic Linear Algebra Subroutines on a Matrix Co-processor.
Proceedings of the Parallel Processing and Applied Mathematics, 2007

Fine-grained Matrix Multiply-Add on a Torus Array Processor.
Proceedings of the 22nd International Conference on Computers and Their Applications, 2007

Evaluating the Performance of Basic Linear Algebra Subroutines on a Torus Array Processor.
Proceedings of the Seventh International Conference on Computer and Information Technology (CIT 2007), 2007

2006
The general matrix multiply-add operation on 2D torus.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Matrix Transpose on 2D Torus Array Processor.
Proceedings of the Sixth International Conference on Computer and Information Technology (CIT 2006), 2006

2005
Performance Evaluation of Blas on the Trident Processor.
Parallel Process. Lett., 2005

Computationally Efficient Parallel Matrix-Matrix Multiplication on the Torus.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Parallel Blocked Algorithm for Solving the Algebraic Path Problem on a Matrix Processor.
Proceedings of the High Performance Computing and Communications, 2005

A Matrix Processor for Math-intensive Applications.
Proceedings of the ISCA 18th International Conference on Computer Applications in Industry and Engineering, 2005

2003
Matrix Bidiagonalization: Implementation and Evaluation on the Trident Processor.
Neural Parallel Sci. Comput., 2003

Parallel LU-decomposition on Pentium Streaming SIMD Extensions.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

Matrix Bidiagonalization on the Trident Processor.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Trident: Technology-Scalable Architecture for Data Parallel Application.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

BLAS on the Trident Processor: Implementation and Performance Evaluation.
Proceedings of the ISCA 18th International Conference Computers and Their Applications, 2003

2002
A Multi-level ISA Processor for Accelerating Data Parallel Applications.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

Performance Analysis of SVD Algorithm on the Trident Processor.
Proceedings of the 1st International Symposium on Cyber Worlds (CW 2002), 2002

2001
Pattern Dependent Reconstruction of Raster Digital Elevation Models from Contour Maps.
Proceedings of the IASTED International Conference on Visualization, 2001

2000
Performance Evaluation of the Clustered Web Server.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2000

Design of Multi-dimensional DCT Array Processors for Video Applications.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999
Design Of I/O Efficient, Scalable Array Processors for Multi-dimensional DFT.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

A new scalable array processor for two-dimensional discrete Fourier transform.
Proceedings of the Parallel Computing: Fundamentals & Applications, 1999

1997
An Interactive Graphic Tool for Systematic Design and Analysis of VLSI Array Processors.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1997

1996
Array Processors Design for Division-free Linear System Solving.
Comput. J., 1996

Parallel Rendering with the Network Linda System.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1996

Parallel Algorithm And Architecture For Two-Step Division-Free Gaussian Elimination.
Proceedings of the 1996 International Conference on Application-Specific Systems, 1996

1995
An Algorithm and Array Processor for Solving the Systems of Linear Equations.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1995

1994
A new systolic architecture for pipeline prime factor DFT-algorithm.
Proceedings of the Fourth Great Lakes Symposium on Design Automation of High Performance VLSI Systems, 1994

Systematic Approach and Software Tool for Systolic Design.
Proceedings of the Parallel Processing: CONPAR 94, 1994

1990
Systolic Array Architecture for Two-Dimensional Discrete Fourier Transform.
Proceedings of the CONPAR 90, 1990


  Loading...