Taisuke Boku

According to our database1, Taisuke Boku authored at least 122 papers between 1985 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepage:

On csauthors.net:

Bibliography

2019
Evaluation of XcalableACC with tightly coupled accelerators/InfiniBand hybrid communication on accelerated cluster.
IJHPCA, 2019

Implementation and evaluation of the HPC challenge benchmark in the XcalableMP PGAS language.
IJHPCA, 2019

SALMON: Scalable Ab-initio Light-Matter simulator for Optics and Nanoscience.
Computer Physics Communications, 2019

Using FPGAs to Accelerate HPC and Data Analytics on Intel-Based Systems.
Proceedings of the High Performance Computing, 2019

MITRACA: A Next-Gen Heterogeneous Architecture.
Proceedings of the 13th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2019

GPU-FPGA Heterogeneous Computing with OpenCL-Enabled Direct Memory Access.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Parallel Processing on FPGA Combining Computation and Communication in OpenCL Programming.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Scalable communication performance prediction using auto-generated pseudo MPI event trace.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019

FPGA-based Implementation of Memory-Intensive Application using OpenCL.
Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2019

MITRACA: Manycore Interlinked Torus Reconfigurable Accelerator Architecture.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

2018
Performance Optimization and Evaluation of Scalable Optoelectronics Application on Large Scale KNL Cluster.
Proceedings of the High Performance Computing - 33rd International Conference, 2018

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use.
Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018

Trade-Off of Offloading to FPGA in OpenMP Task-Based Programming.
Proceedings of the Evolving OpenMP for Evolving Architectures, 2018

Performance and Scalability of Lightweight Multi-kernel Based Operating Systems.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Performance evaluation for a hydrodynamics application in XcalableACC PGAS language for accelerated clusters.
Proceedings of the Proceedings of Workshops of HPC Asia 2018, 2018

Performance evaluation for omni XcalableMP compiler on many-core cluster system based on knights landing.
Proceedings of the Proceedings of Workshops of HPC Asia 2018, 2018

Linkage of XcalableMP and Python languages for high productivity on HPC cluster system: application to graph order/degree problem.
Proceedings of the Proceedings of Workshops of HPC Asia 2018, 2018

Multiple endpoints for improved MPI performance on a lattice QCD code.
Proceedings of the Proceedings of Workshops of HPC Asia 2018, 2018

OpenCL-ready High Speed FPGA Network for Reconfigurable High Performance Computing.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018

Scaling collectives on large clusters using Intel(R) architecture processors and fabric.
Proceedings of the Proceedings of Workshops of HPC Asia 2018, 2018

Performance Evaluation of Large Scale Electron Dynamics Simulation under Many-core Cluster based on Knights Landing.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018

Accelerating Space Radiative Transfer on FPGA using OpenCL.
Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2018

2017
Runtime Correctness Checking for Emerging Programming Paradigms.
Proceedings of the First International Workshop on Software Correctness for HPC Applications, 2017

Thorough analysis of PCIe Gen3 communication.
Proceedings of the International Conference on ReConFigurable Computing and FPGAs, 2017

Mixed Precision Solver Scalable to 16000 MPI Processes for Lattice Quantum Chromodynamics Simulations on the Oakforest-PACS System.
Proceedings of the Fifth International Symposium on Computing and Networking, 2017

Implementing Lattice QCD Application with XcalableACC Language on Accelerated Cluster.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Implementation and Evaluation of One-sided PGAS Communication in XcalableACC for Accelerated Clusters.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP.
Parallel Computing, 2016

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2016, 2016

Design and Preliminary Evaluation of Omni OpenACC Compiler for Massive MIMD Processor PEZY-SC.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Electron Dynamics Simulation with Time-Dependent Density Functional Theory on Large Scale Symmetric Mode Xeon Phi Cluster.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015
Implementation of CG Method on GPU Cluster with Proprietary Interconnect TCA for GPU Direct Communication.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Evaluation of FFT for GPU Cluster Using Tightly Coupled Accelerators Architecture.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing.
Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

2014
PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators.
SIGARCH Computer Architecture News, 2014

Massively-parallel electron dynamics calculations in real-time and real-space: Toward applications to nanostructures of more than ten-nanometers in size.
J. Comput. Physics, 2014

Performance evaluation of ultra-large-scale first-principles electronic structure calculation code on the K computer.
IJHPCA, 2014

XcalableACC: extension of XcalableMP PGAS language using OpenACC for accelerator clusters.
Proceedings of the First Workshop on Accelerator Programming using Directives, 2014

Nuclear Fusion Simulation Code Optimization and Performance Evaluation on GPU Cluster.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

A Preliminarily Evaluation of PEACH3: A Switching Hub for Tightly Coupled Accelerators.
Proceedings of the Second International Symposium on Computing and Networking, 2014

QCD Library for GPU Cluster with Proprietary Interconnect for GPU Direct Communication.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013
Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Nuclear Fusion Simulation Code Optimization on GPU Clusters.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

Interconnection Network for Tightly Coupled Accelerators Architecture.
Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

Task level pipelining with PEACH2: An FPGA switching fabric for high performance computing.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

2012
Implementation of XcalableMP Device Acceleration Extention with OpenCL.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Productivity and Performance of Global-View Programming with XcalableMP PGAS Language.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
Peach: A Multicore Communication System on Chip with PCI Express.
IEEE Micro, 2011

The International Exascale Software Project roadmap.
IJHPCA, 2011

First-principles calculations of electron states of a silicon nanowire with 100, 000 atoms on the K computer.
Proceedings of the Conference on High Performance Computing Networking, 2011

An 80Gb/s dependable communication SoC with PCI express I/F and 8 CPUs.
Proceedings of the IEEE International Solid-State Circuits Conference, 2011

PEARL and PEACH: A Novel PCI Express Direct Link and Its Implementation.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

Introduction.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

XMCAPI: Inter-core Communication Interface on Multi-chip Embedded Systems.
Proceedings of the IEEE/IFIP 9th International Conference on Embedded and Ubiquitous Computing, 2011

An 80 Gbps dependable multicore communication SoC with PCI express I/F and intelligent interrupt controller.
Proceedings of the 2011 IEEE Symposium on Low-Power and High-Speed Chips, 2011

2010
A massively-parallel electronic-structure calculations based on real-space density functional theory.
J. Comput. Physics, 2010

XcalableMP implementation and performance of NAS Parallel Benchmarks.
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, 2010

PEARL: Power-Aware, Dependable, and High-Performance Communication Link Using PCI Express.
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

2009
Evaluation of Multicore Processors for Embedded Systems by Parallel Benchmark Program Using OpenMP.
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009

Towards an Open Dependable Operating System.
Proceedings of the 2009 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, 2009

RI2N/DRV: Multi-link ethernet for high-bandwidth and fault-tolerant network on PC clusters.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Flexible Multi-link Ethernet Binding System for PC Clusters with Asymmetric Topology.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Using a cluster as a memory resource: A fast and large virtual memory on MPI.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008
Integrating Computing Resources on Multiple Grid-Enabled Job Scheduling Systems Through a Grid RPC System.
J. Grid Comput., 2008

A dynamic routing control system for high-performance PC cluster with multi-path Ethernet connection.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

OpenMPD: A Directive-Based Data Parallel Language Extension for Distributed Memory Systems.
Proceedings of the 37th International Conference on Parallel Processing, 2008

RI2N: High-bandwidth and fault-tolerant network with multi-link Ethernet for PC clusters.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007
Design and Implementation of OpenMPD: An OpenMP-Like Programming Language for Distributed Memory Systems.
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007

RI2N/UDP: High bandwidth and fault-tolerant network for a PC-cluster based on multi-link Ethernet.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

2006
Storage challenge - High performance data analysis for particle physics using the Gfarm file system.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Profile-based optimization of power performance by using dynamic voltage scaling on a PC cluster.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

MegaProto/E: power-aware high-performance cluster with commodity technology.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

A scalable communication layer for multi-dimensional hyper crossbar network using multiple gigabit ethernet.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Performance Improvement by Data Management Layer in a Grid RPC System.
Proceedings of the Advances in Grid and Pervasive Computing, 2006

Emprical study on Reducing Energy of Parallel Programs using Slack Reclamation by DVFS in a Power-scalable High Performance Cluster.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

PACS-CS: A Large-Scale Bandwidth-Aware PC Cluster for Scientific Computations.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005
MegaProto: 1 TFlops/10kW Rack Is Feasible Even with Only Commodity Technology.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Computation of High-Precision Mathematical Constants in a Combined Cluster and Grid Environment.
Proceedings of the Large-Scale Scientific Computing, 5th International Conference, 2005

Design of a Software Distributed Shared Memory System using an MPI communication layer.
Proceedings of the 8th International Symposium on Parallel Architectures, 2005

Low-cost High-bandwidth Tree Network for PC Clusters based on Tagged-VLAN Technology.
Proceedings of the 8th International Symposium on Parallel Architectures, 2005

Empirical Study for Optimization of Power-Performance with On-Chip Memory.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

MegaProto: A Low-Power and Compact Cluster for High-Performance Computing.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Grid Environment for Computational Astrophysics Driven by GRAPE-6 with HMCS-G and OmniRPC.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

2004
SCIMA-SMP: on-chip memory processor architecture for SMP.
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

The Second Trans-Pacific Grid Datafarm Testbed and Experiments for SC2003.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

Heterogeneous Remote Computing System for Computational Astrophysics with OmniRPC.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

Performance Evaluation of OmniRPC in a Grid Environment.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

An Implementation of Parallel 3-D FFT Using Short Vector SIMD Instructions on Clusters of PCs.
Proceedings of the Applied Parallel Computing, 2004

Parallel Implementation of Strassen's Matrix Multiplication Algorithm for Heterogeneous Clusters.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Implementation and performance evaluation of CONFLEX-G: grid-enabled molecular conformational space search program with OmniRPC.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Formation of Dwarf Galaxies in Reionized Universe with Heterogeneous Multi-computer System.
Proceedings of the Computational Science, 2004

2003
Performance Evaluation of the Hitachi SR8000 Using SPEC OMP2001 Benchmarks.
International Journal of Parallel Programming, 2003

An OpenMP Implementation of Parallel FFT and Its Performance on IA-64 Processors.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

RI2N - Interconnection Network System for Clusters with Wide-Bandwidth and Fault-Tolerancy Based on Multiple Links.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

OmniRPC: a Grid RPC ystem for Parallel Programming in Cluster and Grid Environment.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

HMCS-G: Grid-enabled Hybrid Computing System for Computational Astrophysics.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2002
Heterogeneous Multi-Computer System: A New Paradigm of Parallel Processing.
Proceedings of the 2002 International Conference on Parallel Computing in Electrical Engineering (PARELEC 2002), 2002

Performance Evaluation of the Hitachi SR8000 Using OpenMP Benchmarks.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

Heterogeneous multi-computer system: a new platform for multi-paradigm scientific simulation.
Proceedings of the 16th international conference on Supercomputing, 2002

A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs.
Proceedings of the Euro-Par 2002, 2002

2001
PIO: Parallel I/O System for Massively Parallel Processors.
Proceedings of the High-Performance Computing and Networking, 9th International Conference, 2001

2000
Software Controlled Reconfigurable On-Chip Memory for High Performance Computing.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

SCIMA: Software Controlled Integrated Memory Architecture for High Performance Computing.
Proceedings of the IEEE International Conference On Computer Design: VLSI In Computers & Processors, 2000

1999
CP-PACS: A massively parallel processor at the University of Tsukuba.
Parallel Computing, 1999

Performance of lattice QCD programs on CP-PACS.
Parallel Computing, 1999

Commodity Network Based Parallel I/O System for Massively Parallel Processors.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

1998
Practical Simulation of Large-Scale Parallel Programs and Its Performance Analysis of the NAS Parallel Benchmarks.
Proceedings of the Euro-Par '98 Parallel Processing, 1998

1997
CP-PACS: A Massively Parallel Processor for Large Scale Scientific Calculations.
Proceedings of the 11th international conference on Supercomputing, 1997

Advanced processor design using hardware description language AIDL.
Proceedings of the ASP-DAC '97 Asia and South Pacific Design Automation Conference, 1997

1996
Adaptive routing technique on hypercrossbar network and its evaluation.
Systems and Computers in Japan, 1996

1994
Evaluation of Pseudo Vector Processor Based on Slide-Windowed Registers.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

1993
A Scalar Architecture for Pseudo Vector Processing Based on Slide-Windowed Registers.
Proceedings of the 7th international conference on Supercomputing, 1993

1991
NCC: A concurrent description language for scientific calculation on multiprocessors.
Systems and Computers in Japan, 1991

1990
(SM)²-II: A Large-Scale Multiprocessor for Sparse Matrix Calculations.
IEEE Trans. Computers, 1990

1988
IMPULSE: A High Performance Processing Unit for Multiprocessors for Scientific Calculation.
Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988

1985
(SM)²-II: A New Version of the Sparse Matrix Solving Machine.
Proceedings of the 12th Annual Symposium on Computer Architecture, 1985


  Loading...