# Daisuke Takahashi

Daisuke Takahashi authored at least 96 papers between 1999 and 2018.

## Timeline

## Bibliography

2018

Computation of the 100 quadrillionth hexadecimal digit of

*π*on a cluster of Intel Xeon Phi processors.
Parallel Computing, 2018

Extended Reproduction of Demonstration Motion Using Variational Autoencoder.

Proceedings of the 27th IEEE International Symposium on Industrial Electronics, 2018

2017

A Customizable Auto-Tuning Scenario with User-Defined Code Transformations.

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors.

Proceedings of the Computational Science and Its Applications - ICCSA 2017, 2017

2016

Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs.

Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2016

Implementation of Multiple-Precision Floating-Point Arithmetic on Intel Xeon Phi Coprocessors.

Proceedings of the Computational Science and Its Applications - ICCSA 2016, 2016

Parallel Sparse Matrix-Vector Multiplication Using Accelerators.

Proceedings of the Computational Science and Its Applications - ICCSA 2016, 2016

Automatic Tuning of Computation-Communication Overlap for Parallel 1-D FFT.

Proceedings of the 2016 IEEE Intl Conference on Computational Science and Engineering, 2016

2015

Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs.

Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster.

Proceedings of the Third International Symposium on Computing and Networking, 2015

2014

Virtual flow-net for accountability and forensics of computer and network systems.

Security and Communication Networks, 2014

Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

Journal of Computational Chemistry, 2014

Performance evaluation of ultra-large-scale first-principles electronic structure calculation code on the K computer.

IJHPCA, 2014

A study on application-aware power-saving control method for sensor stations in home gateway.

Proceedings of the IEEE 3rd Global Conference on Consumer Electronics, 2014

A study on application-aware QoS control in OSGi based home gateway.

Proceedings of the IEEE 3rd Global Conference on Consumer Electronics, 2014

2013

Highly scalable implementation of an

*N*N-body code on a GPU cluster.
Computer Physics Communications, 2013

Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs.

Proceedings of the Parallel Processing and Applied Mathematics, 2013

Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs.

Proceedings of the Computational Science and Its Applications - ICCSA 2013, 2013

Efficient Hybrid Breadth-First Search on GPUs.

Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

A study on OSGi based home gateway employing application-aware QoS control.

Proceedings of the IEEE 2nd Global Conference on Consumer Electronics, 2013

Implementation of Parallel 1-D FFT on GPU Clusters.

Proceedings of the 16th IEEE International Conference on Computational Science and Engineering, 2013

Optimizing Objective Function Parameters for Strength in Computer Game-Playing.

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012

Accountability using flow-net: design, implementation, and performance evaluation.

Security and Communication Networks, 2012

A Fast Implementation and Performance Analysis of Collisionless N-body Code Based on GPGPU.

Proceedings of the International Conference on Computational Science, 2012

Implementation of XcalableMP Device Acceleration Extention with OpenCL.

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs.

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

An Implementation of Parallel 2-D FFT Using Intel AVX Instructions on Multi-core Processors.

Proceedings of the Algorithms and Architectures for Parallel Processing, 2012

An Implementation of Parallel 1-D FFT on the K Computer.

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS Format on GPUs.

Proceedings of the 15th IEEE International Conference on Computational Science and Engineering, 2012

2011

Wireless telemedicine and m-health: technologies, applications and research issues.

IJSNet, 2011

First-principles calculations of electron states of a silicon nanowire with 100, 000 atoms on the K computer.

Proceedings of the Conference on High Performance Computing Networking, 2011

Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU.

Proceedings of the Computational Science and Its Applications - ICCSA 2011, 2011

2010

Parallel implementation of multiple-precision arithmetic and 2, 576, 980, 370, 000 decimal digits of pi calculation.

Parallel Computing, 2010

A massively-parallel electronic-structure calculations based on real-space density functional theory.

J. Comput. Physics, 2010

IEEE 802.11 user fingerprinting and its applications for intrusion detection.

Computers & Mathematics with Applications, 2010

Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs.

Proceedings of the Applied Parallel and Scientific Computing, 2010

Automatic Tuning for Parallel FFTs.

Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009

An Implementation of Parallel 3-D FFT with 2-D Decomposition on a Massively Parallel Cluster of Multi-core Processors.

Proceedings of the Parallel Processing and Applied Mathematics, 2009

2008

Retrieving knowledge from auditing log-files for computer and network forensics and accountability.

Security and Communication Networks, 2008

A parallel method for large sparse generalized eigenvalue problems using a GridRPC system.

Future Generation Comp. Syst., 2008

Temperature-Aware Routing for Telemedicine Applications in Embedded Biomedical Sensor Networks.

EURASIP J. Wireless Comm. and Networking, 2008

On-Demand Anonymous Routing with Distance Vector Protecting Traffic Privacy in Wireless Multi-hop Networks.

Proceedings of the MSN 2008, 2008

Complexity Analysis of Retrieving Knowledge from Auditing Log Files for Computer and Network Forensics and Accountability.

Proceedings of IEEE International Conference on Communications, 2008

2007

Telemedicine Usage and Potentials.

Proceedings of the IEEE Wireless Communications and Networking Conference, 2007

A Parallel Algorithm for Multiple-Precision Division by a Single-Precision Integer.

Proceedings of the Large-Scale Scientific Computing, 6th International Conference, 2007

RI2N/UDP: High bandwidth and fault-tolerant network for a PC-cluster based on multi-link Ethernet.

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

High Performance FFT on SGI Altix 3700.

Proceedings of the High Performance Computing and Communications, 2007

LTRT: Least Total-Route Temperature Routing for Embedded Biomedical Sensor Networks.

Proceedings of the Global Communications Conference, 2007

2006

S12 - The HPC Challenge (HPCC) benchmark suite.

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

An Implementation of Parallel 1-D FFT Using SSE3 Instructions on Dual-Core Processors.

Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Profile-based optimization of power performance by using dynamic voltage scaling on a PC cluster.

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

MegaProto/E: power-aware high-performance cluster with commodity technology.

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Performance Improvement by Data Management Layer in a Grid RPC System.

Proceedings of the Advances in Grid and Pervasive Computing, 2006

Emprical study on Reducing Energy of Parallel Programs using Slack Reclamation by DVFS in a Power-scalable High Performance Cluster.

Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

PACS-CS: A Large-Scale Bandwidth-Aware PC Cluster for Scientific Computations.

Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

Robust Posture Estimation of the Human Face in Rapid Lighting Changes using a 3-D Reference Picture.

Proceedings of the Canadian Conference on Electrical and Computer Engineering, 2006

2005

An algorithm for multiple-precision floating-point multiplication.

Applied Mathematics and Computation, 2005

MegaProto: 1 TFlops/10kW Rack Is Feasible Even with Only Commodity Technology.

Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

A Hybrid MPI/OpenMP Implementation of a Parallel 3-D FFT on SMP Clusters.

Proceedings of the Parallel Processing and Applied Mathematics, 2005

Computation of High-Precision Mathematical Constants in a Combined Cluster and Grid Environment.

Proceedings of the Large-Scale Scientific Computing, 5th International Conference, 2005

Design of a Software Distributed Shared Memory System using an MPI communication layer.

Proceedings of the 8th International Symposium on Parallel Architectures, 2005

Low-cost High-bandwidth Tree Network for PC Clusters based on Tagged-VLAN Technology.

Proceedings of the 8th International Symposium on Parallel Architectures, 2005

Empirical Study for Optimization of Power-Performance with On-Chip Memory.

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

MegaProto: A Low-Power and Compact Cluster for High-Performance Computing.

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Grid Environment for Computational Astrophysics Driven by GRAPE-6 with HMCS-G and OmniRPC.

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Low Temperature Limit of Equations - Hidden Discrete Structure.

Proceedings of the CCA 2005, 2005

2004

A stochastic model for solitons.

Random Struct. Algorithms, 2004

SCIMA-SMP: on-chip memory processor architecture for SMP.

Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

Heterogeneous Remote Computing System for Computational Astrophysics with OmniRPC.

Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

Performance Evaluation of OmniRPC in a Grid Environment.

Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

An Implementation of Parallel 3-D FFT Using Short Vector SIMD Instructions on Clusters of PCs.

Proceedings of the Applied Parallel Computing, 2004

A Parallel Method for Large Sparse Generalized Eigenvalue Problems by OmniRPC in a Grid Environment.

Proceedings of the Applied Parallel Computing, 2004

Parallel Implementation of Strassen's Matrix Multiplication Algorithm for Heterogeneous Clusters.

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Implementation and performance evaluation of CONFLEX-G: grid-enabled molecular conformational space search program with OmniRPC.

Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Formation of Dwarf Galaxies in Reionized Universe with Heterogeneous Multi-computer System.

Proceedings of the Computational Science, 2004

2003

A parallel 1-D FFT algorithm for the Hitachi SR8000.

Parallel Computing, 2003

Performance Evaluation of the Hitachi SR8000 Using SPEC OMP2001 Benchmarks.

International Journal of Parallel Programming, 2003

An OpenMP Implementation of Parallel FFT and Its Performance on IA-64 Processors.

Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

RI2N - Interconnection Network System for Clusters with Wide-Bandwidth and Fault-Tolerancy Based on Multiple Links.

Proceedings of the High Performance Computing, 5th International Symposium, 2003

A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method.

Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

OmniRPC: a Grid RPC ystem for Parallel Programming in Cluster and Grid Environment.

Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

HMCS-G: Grid-enabled Hybrid Computing System for Computational Astrophysics.

Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2002

A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers.

Proceedings of the Applied Parallel Computing Advanced Scientific Computing, 2002

Performance Evaluation of the Hitachi SR8000 Using OpenMP Benchmarks.

Proceedings of the High Performance Computing, 4th International Symposium, 2002

A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs.

Proceedings of the Euro-Par 2002, 2002

2001

A Mixed-Radix Parallel Three-Dimensional FFT Algorithm on Clusters of Vector SMPs.

Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001

A Blocking Algorithm for FFT on Cache-Based Processors.

Proceedings of the High-Performance Computing and Networking, 9th International Conference, 2001

2000

High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers.

The Journal of Supercomputing, 2000

A fast algorithm for computing large Fibonacci numbers.

Inf. Process. Lett., 2000

A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs.

Proceedings of the Applied Parallel Computing, 2000

A Performance Study on a Single Processing Node of the HITACHI SR8000.

Proceedings of the Numerical Analysis and Its Applications, 2000

Implementation of Multiple-Precision Parallel Division and Square Root on Distributed-Memory Parallel Computers.

Proceedings of the 2000 International Workshop on Parallel Processing, 2000

A new radix-6 FFT algorithm suitable for multiply-add instruction.

Proceedings of the IEEE International Conference on Acoustics, 2000

1999

Fast High-Precision Arithmetic on Distributed Memory Parallel Machines.

Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999