Reiji Suda

Affiliations:
  • University of Tokyo


According to our database1, Reiji Suda authored at least 66 papers between 1994 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Worst-case analysis of LPT scheduling on a small number of non-identical processors.
Inf. Process. Lett., January, 2024

2023
Efficient Additions and Montgomery Reductions of Large Integers for SIMD.
Proceedings of the 30th IEEE Symposium on Computer Arithmetic, 2023

2022
Throughput-Optimized Implementation of Isogeny-based Cryptography on Vectorized ARM SVE Processor.
Proceedings of the Tenth International Symposium on Computing and Networking, 2022

2020
Permute to Train: A New Dimension to Training Deep Neural Networks.
CoRR, 2020

Xevolver: A code transformation framework for separation of system-awareness from application codes.
Concurr. Comput. Pract. Exp., 2020

Train-by-Reconnect: Decoupling Locations of Weights from Their Values.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Diamond matrix powers kernels.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

2018
Fast Generation of Poisson-Disk Samples on Mesh Surfaces by Progressive Sample Projection.
Proc. ACM Comput. Graph. Interact. Tech., 2018

Introduction to iWAPT 2018.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Automatic Hyperparameter Tuning of Machine Learning Models under Time Constraints.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

2017
Second order accuracy finite difference methods for space-fractional partial differential equations.
J. Comput. Appl. Math., 2017

Introduction to iWAPT Workshop.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Embedded-DSL-Like Code Generation and Optimization of Bayesian Estimation Routines with User-Defined Source-to-Source Code Transformation Framework Xevolver.
Proceedings of the Fifth International Symposium on Computing and Networking, 2017

Fast maximal Poisson-disk sampling by randomized tiling.
Proceedings of High Performance Graphics, 2017

2016
Xevtgen: Fortran code transformer generator for high performance scientific codes.
Int. J. Netw. Comput., 2016

Efficient Parallel Algorithm for Optimal DAG Structure Search on Parallel Computer with Torus Network.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

Xevdriver: A Software System Supporting XML-based Source-to-Source Code Transformations on Fortran Programs.
Proceedings of the Fourth International Symposium on Computing and Networking, 2016

2015
Performance Analysis of the Chebyshev Basis Conjugate Gradient Method on the K Computer.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

2014
The future of accelerator programming: abstraction, performance or can we have both?
Proceedings of the Symposium on Applied Computing, 2014

2013
Analysis Of The Girth For Regular Bi-partite Graphs With Degree 3
CoRR, 2013

Enumeration Based Search Algorithm For Finding A Regular Bi-partite Graph Of Maximum Attainable Girth For Specified Degree And Number Of Vertices
CoRR, 2013

An Efficient Task Partitioning and Scheduling Method for Symmetric Multiple GPU Architecture.
Proceedings of the 12th IEEE International Conference on Trust, 2013

Register level sort algorithm on multi-core SIMD processors.
Proceedings of the 3rd Workshop on Irregular Applications - Architectures and Algorithms, 2013

High Performance GPU Accelerated Local Optimization in TSP.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

A Mathematical Method for Online Autotuning of Power and Energy Consumption with Corrected Temperature Effects.
Proceedings of the International Conference on Computational Science, 2013

2012
Energy-Aware SIMD Algorithm Design on GPU and Multicore Architectures.
Proceedings of the Handbook of Energy-Aware and Green Computing - Two Volume Set., 2012

Global optimization model on power efficiency of GPU and multicore processing element for SIMD computing with CUDA.
Comput. Sci. Res. Dev., 2012

Partition Parameters for Girth Maximum (m, r) BTUs
CoRR, 2012

Balanced Tanner Units And Their Properties
CoRR, 2012

Automatic Parameter Optimization for Edit Distance Algorithm on GPU.
Proceedings of the High Performance Computing for Computational Science, 2012

Brief announcement: a GPU accelerated iterated local search TSP solver.
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Poster: High Performance GPU Accelerated TSP Solver.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: High Performance GPU Accelerated TSP Solver.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

MSSM: An Efficient Scheduling Mechanism for CUDA Basing on Task Partition.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

An efficient GPU implementation of a multi-start TSP solver for large problem instances.
Proceedings of the Genetic and Evolutionary Computation Conference, 2012

Accelerating 2-opt and 3-opt Local Search Using GPU in the Travelling Salesman Problem.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
APTCC: Auto Parallelizing Translator From C To CUDA.
Proceedings of the International Conference on Computational Science, 2011

Parallel Monte Carlo Tree Search on GPU.
Proceedings of the Eleventh Scandinavian Conference on Artificial Intelligence, 2011

Parallelizing a Coarse Grain Graph Search Problem Based upon LDPC Codes on a Supercomputer.
Proceedings of the Sixth International Symposium on Parallel Computing in Electrical Engineering (PARELEC 2011), 2011

Large-Scale Parallel Monte Carlo Tree Search on GPU.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

A Performance and Energy Consumption Analytical Model for GPU.
Proceedings of the IEEE Ninth International Conference on Dependable, 2011

Parallel Monte Carlo Tree Search Scalability Discussion.
Proceedings of the AI 2011: Advances in Artificial Intelligence, 2011

Experimental Estimation and Analysis of the Power Efficiency of CUDA Processing Element on SIMD Computing.
Proceedings of the 10th IEEE/ACIS International Conference on Computer and Information Science, 2011

2010
Investigation on the power efficiency of multi-core and GPU Processing Element in large scale SIMD computation with CUDA.
Proceedings of the International Green Computing Conference 2010, 2010

Software Automatic Tuning: Concepts and State-of-the-Art Results.
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

A Bayesian Method of Online Automatic Tuning.
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

Autotuning Method for Deciding Block Size Parameters in Dynamically Load-Balanced BLAS.
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

Toward Automatic Performance Tuning for Numerical Simulations in the SILC Matrix Computation Framework.
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009
Parallel Minimax Tree Searching on GPU.
Proceedings of the Parallel Processing and Applied Mathematics, 2009

Modeling and Optimizing the Power Performance of Large Matrices Multiplication on Multi-core and GPU Platform with CUDA.
Proceedings of the Parallel Processing and Applied Mathematics, 2009

Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing.
Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

Modeling and Estimation for the Power Consumption of Matrix Computation on Multi-core Platform.
Proceedings of the Second International Joint Conference on Computational Sciences and Optimization, 2009

Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA.
Proceedings of the 12th IEEE International Conference on Computational Science and Engineering, 2009

Aspects of GPU for general purpose high performance computing.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
Divisible load scheduling with improved asymptotic optimality.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

An optimized Dynamic Load Balancing method for parallel 3-D mesh refinement for finite element electromagnetics with Tetrahedra.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007
Cloth Simulation in the SILC Matrix Computation Framework: A Case Study.
Proceedings of the Parallel Processing and Applied Mathematics, 2007

High Performance FFT on SGI Altix 3700.
Proceedings of the High Performance Computing and Communications, 2007

2006
Distributed SILC: An Easy-to-Use Interface for MPI-Based Parallel Matrix Computation Libraries.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

2005
SILC: A Flexible and Environment-Independent Interface for Matrix Computation Libraries.
Proceedings of the Parallel Processing and Applied Mathematics, 2005

Performance Evaluation of Parallel Sparse Matrix-Vector Products on SGI Altix3700.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005

2002
A fast spherical harmonics transform algorithm.
Math. Comput., 2002

1999
A high performance parallelization scheme for the Hessenberg double shift QR algorithm.
Parallel Comput., 1999

1998
The Ensparsed LU Decomposition Method for Large Scale Circuit Transient Analysis.
Proceedings of the ASP-DAC '98, 1998

1995
Implementation of Sparta, a Highly Parallel Circuit Simulator by the Preconditioned Jacobi Method, on a Distributed Memory Machine.
Proceedings of the 9th international conference on Supercomputing, 1995

1994
QFP wiring problem-introduction and analytical considerations.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1994


  Loading...