Satoshi Matsuoka

Affiliations:
  • Tokyo Institute of Technology, Japan


According to our database1, Satoshi Matsuoka authored at least 316 papers between 1988 and 2023.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2011, "For contributions to the design of high-performance computers.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads.
ACM Trans. Archit. Code Optim., December, 2023

Simeuro: A Hybrid CPU-GPU Parallel Simulator for Neuromorphic Computing Chips.
IEEE Trans. Parallel Distributed Syst., October, 2023

Myths and legends in high-performance computing.
Int. J. High Perform. Comput. Appl., July, 2023

Efficient checkpoint/Restart of CUDA applications.
Parallel Comput., 2023

Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt).
CoRR, 2023

Revisiting Temporal Blocking Stencil Optimizations.
Proceedings of the 37th International Conference on Supercomputing, 2023

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications.
Proceedings of the 37th International Conference on Supercomputing, 2023

2022
Efficient high-precision integer multiplication on the GPU.
Int. J. High Perform. Comput. Appl., 2022

Digital transformation of droplet/aerosol infection risk assessment realized on "Fugaku" for the fight against COVID-19.
Int. J. High Perform. Comput. Appl., 2022

Preparing for the Future - Rethinking Proxy Applications.
Comput. Sci. Eng., 2022

Preparing for the Future - Rethinking Proxy Apps.
CoRR, 2022

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache.
CoRR, 2022

Persistent Kernels for Iterative Memory-bound GPU Applications.
CoRR, 2022

2021
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism.
IEEE Trans. Parallel Distributed Syst., 2021

Co-design Center for Exascale Machine Learning Technologies (ExaLearn).
Int. J. High Perform. Comput. Appl., 2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.
CoRR, 2021

Fugaku and A64FX: the First Exascale Supercomputer and its Innovative Arm CPU.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021

Scalable FBP decomposition for cone-beam CT reconstruction.
Proceedings of the International Conference for High Performance Computing, 2021


Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Performance portable back-projection algorithms on CPUs: agnostic data locality and vectorization optimizations.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2020
A Survey on Coarse-Grained Reconfigurable Architectures From a Performance Perspective.
IEEE Access, 2020

Scaling distributed deep learning workloads beyond the memory capacity with KARMA.
Proceedings of the International Conference for High Performance Computing, 2020

A Formal Model for a Linear Time Correctness Condition of Proof Nets of Multiplicative Linear Logic.
Proceedings of the Logic-Based Program Synthesis and Transformation, 2020

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

AN5D: automated stencil framework for high-degree temporal blocking on GPUs.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

A Template-based Framework for Exploring Coarse-Grained Reconfigurable Architectures.
Proceedings of the 31st IEEE International Conference on Application-specific Systems, 2020

2019
How File-access Patterns Influence the Degree of I/O Interference between Cluster Applications.
Supercomput. Front. Innov., 2019

Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors.
Parallel Comput., 2019

Scaling Word2Vec on Big Corpus.
Data Sci. Eng., 2019

A New Linear Time Correctness Condition for Multiplicative Linear Logic.
CoRR, 2019

Learning Neural Representations for Predicting GPU Performance.
Proceedings of the High Performance Computing - 34th International Conference, 2019

MH-QEMU: Memory-State-Aware Fault Injection Platform.
Proceedings of the Supercomputing Frontiers - 5th Asian Conference, 2019

The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface.
Proceedings of the 2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing, 2019

HyperX topology: first at-scale implementation and comparison to the fat-tree.
Proceedings of the International Conference for High Performance Computing, 2019

iFDK: a scalable framework for instant high-resolution image reconstruction.
Proceedings of the International Conference for High Performance Computing, 2019

A versatile software systolic execution model for GPU memory-bound kernels.
Proceedings of the International Conference for High Performance Computing, 2019

Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches?
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method.
Proceedings of the 48th International Conference on Parallel Processing, 2019

The First Supercomputer with HyperX Topology: A Viable Alternative to Fat-Trees?
Proceedings of the 2019 IEEE Symposium on High-Performance Interconnects, 2019

Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

2018
Lock Contention Management in Multithreaded MPI.
ACM Trans. Parallel Comput., 2018

Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations.
Parallel Comput., 2018

Big data and extreme-scale computing.
Int. J. High Perform. Comput. Appl., 2018

Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs.
CoRR, 2018

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching.
CoRR, 2018

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use.
Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018

Machine Learning Predictions for Underestimation of Job Runtime on HPC System.
Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018

DRAGON: breaking GPU memory capacity limits with direct NVM access.
Proceedings of the International Conference for High Performance Computing, 2018

MRG8: Random Number Generation for the Exascale Era.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2018

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Hardware Implementation of POSITs and Their Application in FPGAs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Efficient Solving of Scan Primitive on Multi-GPU Systems.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Interference between I/O and MPI Traffic on Fat-tree Networks.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Explorations of Data Swapping on Burst Buffer.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

Cambrian explosion of computing and big data in the post-moore era.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

Direct Encodings of NP-Complete Problems into Horn Sequents of Multiplicative Linear Logic.
Proceedings of the Functional and Logic Programming - 14th International Symposium, 2018

Predicting Performance Using Collaborative Filtering.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

Accelerating Deep Learning Frameworks with Micro-Batches.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

Efficient Algorithms for the Summed Area Tables Primitive on GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

Optimizing Preconditioned Conjugate Gradient on TaihuLight for OpenFOAM.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017
使用Stencil评估Intel AVX2 Vgather指令 (Evaluating Intel AVX2 Vgather Instructions with Stencils).
计算机科学, 2017

Efficient Breadth-First Search on Massively Parallel and Distributed-Memory Machines.
Data Sci. Eng., 2017

Applying Temporal Blocking with a Directive-based Approach.
Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, 2017

Benchmarking SW26010 Many-Core Processor.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU.
Proceedings of the 46th International Conference on Parallel Processing, 2017

Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor.
Proceedings of the 46th International Conference on Parallel Processing, 2017

Asynchronous, Data-Parallel Deep Convolutional Neural Network Training with Linear Prediction Model for Parameter Transition.
Proceedings of the Neural Information Processing - 24th International Conference, 2017

Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration).
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems Workshops, 2017

Designing and accelerating spiking neural networks using OpenCL for FPGAs.
Proceedings of the International Conference on Field Programmable Technology, 2017

Evaluating high-level design strategies on FPGAs for high-performance computing.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Co-locating Graph Analytics and HPC Applications.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Evaluation of HPC-Big Data Applications Using Cloud Platforms.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

Being "BYTES-oriented" in HPC leads to an open big data/AI ecosystem and further advances into the post-moore era.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016
GPU-Accelerated Large-Scale Distributed Sorting Coping with Device Memory Capacity.
IEEE Trans. Big Data, 2016

Special Issue on Cluster Computing.
Parallel Comput., 2016

Strong Typed Boehm Theorem and Functional Completeness on the Linear Lambda Calculus.
Proceedings of the Proceedings 6th Workshop on Mathematically Structured Functional Programming, 2016

Critical mass in the emergence of collective intelligence: a parallelized simulation of swarms in noisy environments.
Artif. Life Robotics, 2016

Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs.
Proceedings of the International Conference for High Performance Computing, 2016

Migrating Legacy Fortran to Python While Retaining Fortran-Level Performance through Transpilation and Type Hints.
Proceedings of the 6th Workshop on Python for High-Performance and Scientific Computing, 2016

Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn't.
Proceedings of the Student Research Workshop, 2016

Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Towards a Distributed Large-Scale Dynamic Graph Data Store.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

CloudBB: Scalable I/O Accelerator for Shared Cloud Storage.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

Tapas: An Implicitly Parallel Programming Framework for Hierarchical N-Body Algorithms.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU.
Proceedings of the International Conference on Computational Science 2016, 2016

Towards Convergence of Extreme Computing and Big Data Centers.
Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing, 2016

Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

A Directive-Based Data Layout Abstraction for Performance Portability of OpenACC Applications.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

GPU-based fast signal processing for large amounts of snore sound data.
Proceedings of the IEEE 5th Global Conference on Consumer Electronics, 2016

Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen.
Proceedings of the COLING 2016, 2016

Serving More GPU Jobs, with Low Penalty, Using Remote GPU Execution and Migration.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Extreme scale breadth-first search on supercomputers.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

I/O chunking and latency hiding approach for out-of-core sorting acceleration using GPU and flash NVM.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015
Intel Knights Corner的结点级内存访问优化 (Node-level Memory Access Optimization on Intel Knights Corner).
计算机科学, 2015

Python, performance, and natural language processing.
Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing, 2015

MPI+Threads: runtime contention and remedies.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

A New Proof of P-time Completeness of Linear Lambda Calculus.
Proceedings of the 20th International Conferences on Logic for Programming, Artificial Intelligence and Reasoning, 2015

Understanding Performance Portability of OpenACC for Supercomputers.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Exploration of Lossy Compression for Application-Level Checkpoint/Restart.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

AsHES Introduction and Committees.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Hardware-Centric Analysis of Network Performance for MPI Applications.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Efficient Execution of Multiple CUDA Applications Using Transparent Suspend, Resume and Migration.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Latent Fault Detection With Unbalanced Workloads.
Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference (EDBT/ICDT), 2015

Discovering Aspectual Classes of Russian Verbs in Untagged Large Corpora.
Proceedings of the IEEE International Conference on Data Science and Data Intensive Systems, 2015

Modeling Gather and Scatter with Hardware Performance Counters for Xeon Phi.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
Extreme Big Data (EBD): Next Generation Big Data Infrastructure Technologies Towards Yottabyte/Year.
Supercomput. Front. Innov., 2014

Special issue: SC13 - The International Conference for High Performance Computing, Networking, Storage and Analysis.
Sci. Program., 2014

Resilience in Exascale Computing (Dagstuhl Seminar 14402).
Dagstuhl Reports, 2014

An OpenACC extension for data layout transformation.
Proceedings of the First Workshop on Accelerator Programming using Directives, 2014

Fail-in-Place Network Design: Interaction Between Topology, Routing Algorithm and Failures.
Proceedings of the International Conference for High Performance Computing, 2014

Tracing Data Movements within MPI Collectives.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

Using rCUDA to Reduce GPU Resource-Assignment Fragmentation Caused by Job Scheduler.
Proceedings of the 15th International Conference on Parallel and Distributed Computing, 2014

FMI: Fault Tolerant Messaging Interface for Fast and Transparent Recovery.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Hybrid BFS Approach Using Semi-external Memory.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Petascale General Solver for Semidefinite Programming Problems with Over Two Million Constraints.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Scalable analysis of multicore data reuse and sharing.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Cache-aware sparse matrix formats for Kepler GPU.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

TSUBAME-KFC: A modern liquid submersion cooling prototype towards exascale becoming the greenest supercomputer in the world.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Out-of-core GPU memory management for MapReduce-based large-scale graph processing.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

How file access patterns influence interference among cluster applications.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

Large-scale distributed sorting for GPU-based heterogeneous supercomputers.
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

NVM-based Hybrid BFS with memory efficient data structure.
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

Efficient String Sorting on Multi - and Many-Core Architectures.
Proceedings of the 2014 IEEE International Congress on Big Data, Anchorage, AK, USA, June 27, 2014

2013
Guest Editors' Introduction: Special Issue on Applications for the Heterogeneous Computing Era.
Int. J. High Perform. Comput. Appl., 2013

Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Analysis of Data Reuse in Task-Parallel Runtimes.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Improving the Computing Efficiency of HPC Systems Using a Combination of Proactive and Preventive Checkpointing.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Energy-aware I/O optimization for checkpoint and restart on a NAND flash memory system.
Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale, 2013

A parallel optimization method for stencil computation on the domain that is bigger than memory capacity of GPUs.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for Large-Scale Heterogeneous Supercomputers.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
A coding theoretic study of MLL proof nets.
Math. Struct. Comput. Sci., 2012

A Multi GPU Read Alignment Algorithm with Model-Based Performance Optimization.
Proceedings of the High Performance Computing for Computational Science, 2012

Design and modeling of a non-blocking checkpointing system.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Scalable multi-GPU 3-D FFT for TSUBAME 2.0 supercomputer.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

High-performance general solver for extremely large-scale semidefinite programming problems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Using Bittorrent and SVC for efficient video sharing and streaming.
Proceedings of the 2012 IEEE Symposium on Computers and Communications, 2012

Sequence Alignment on Massively Parallel Heterogeneous Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Topic 16: GPU and Accelerators Computing.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Multi-GPU Implementation of the NICAM Atmospheric Model.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Scalable Reed-Solomon-Based Reliable Local Storage for HPC Applications on IaaS Clouds.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

A GPU Implementation of Generalized Graph Processing Algorithm GIM-V.
Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012

Hierarchical Clustering Strategies for Fault Tolerance in Large Scale HPC Systems.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Design and Implementation of Portable and Efficient Non-blocking Collective Communication.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

High performance 3-D FFT using multiple CUDA GPUs.
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012

2011
Preface.
Proceedings of the International Conference on Computational Science, 2011

The International Exascale Software Project roadmap.
Int. J. High Perform. Comput. Appl., 2011

Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer.
Proceedings of the Conference on High Performance Computing Networking, 2011

Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers.
Proceedings of the Conference on High Performance Computing Networking, 2011

Poster: fast GPU read alignment with burrows wheeler transform based index.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Petaflop biofluidics simulations on a two million-core system.
Proceedings of the Conference on High Performance Computing Networking, 2011

FTI: high performance fault tolerance interface for hybrid systems.
Proceedings of the Conference on High Performance Computing Networking, 2011

Making TSUBAME2.0, the world's greenest production supercomputer, even greener: challenges to the architects.
Proceedings of the 2011 International Symposium on Low Power Electronics and Design, 2011

Panel Statement.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

NVCR: A Transparent Checkpoint-Restart Library for NVIDIA CUDA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Performance characteristics of Graph500 on large-scale distributed environment.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Multi-ring Structured Overlay Network for the Inter-cloud Computing Environment.
Proceedings of the CLOSER 2011, 2011

Dealing with Grid-Computing Authorization Using Identity-Based Certificateless Proxy Signature.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010
Model-based Fault Localization: Finding Behavioral Outliers in Large-scale Computing Systems.
New Gener. Comput., 2010

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning.
Comput. Sci. Res. Dev., 2010

Global-scale distributed I/O with ParaMEDIC.
Concurr. Comput. Pract. Exp., 2010

An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code.
Proceedings of the Conference on High Performance Computing Networking, 2010

A high-performance fault-tolerant software framework for memory on commodity GPUs.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Linpack evaluation on a supercomputer with heterogeneous accelerators.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Authorization within grid-computing using certificateless identity-based proxy signature.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

Low-overhead diskless checkpoint for hybrid computing systems.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

Statistical power modeling of GPU kernels using performance counters.
Proceedings of the International Green Computing Conference 2010, 2010

Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters.
Proceedings of the Cloud Computing, Second International Conference, 2010

Distributed Diskless Checkpoint for Large Scale Systems.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009
The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community.
Int. J. High Perform. Comput. Appl., 2009

Interoperation of world-wide production e-Science infrastructures.
Concurr. Comput. Pract. Exp., 2009

Auto-tuning 3-D FFT library for CUDA GPUs.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Power-aware dynamic task scheduling for heterogeneous accelerated clusters.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Fast Conjugate Gradients with Multiple GPUs.
Proceedings of the Computational Science, 2009

A Model-Based Algorithm for Optimizing I/O Intensive Applications in Clouds Using VM-Based Migration.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

File Clustering Based Replication Algorithm in a Grid Environment.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

Adaptive Resource Indexing Technique for Unstructured Peer-to-Peer Networks.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

Aspects of GPU for general purpose high performance computing.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
Intelligent data staging with overlapped execution of grid applications.
Future Gener. Comput. Syst., 2008

Coupled-Simulation e-Science Support in the NAREGI Grid.
Computer, 2008

The Rise of the Commodity Vectors.
Proceedings of the High Performance Computing for Computational Science, 2008

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

To distribute or not to distribute, that is the question in petascale and beyond.
Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities, Baton Rouge, Louisiana, USA, January 29, 2008

Locality aware MPI communication on a commodity opto-electronic hybrid network.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

An efficient, model-based CPU-GPU heterogeneous FFT library.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Model-based fault localization in large-scale computing systems.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Massive supercomputing coping with heterogeneity of modern accelerators.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Model-based optimization for data-intensive application on virtual cluster.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

Access-pattern and bandwidth aware file replication algorithm in a grid environment.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

Environmental-aware optimization of MPI checkpointing intervals.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

Time-Stamping Authority Grid.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007
Outil autonome de surveillance de grilles.
Ingénierie des Systèmes d Inf., 2007

MLL proof nets as error-correcting codes
CoRR, 2007

Weak typed Böhm theorem on IMLL.
Ann. Pure Appl. Log., 2007

Model-based resource selection for efficient virtual cluster deployment.
Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2007

Peer-to-Peer Scheduling System with Scalable Information Sharing Protocol.
Proceedings of the 2007 International Symposium on Applications and the Internet, 2007

Data Management on Grid Filesystem for Data-Intensive Computing.
Proceedings of the 2007 International Symposium on Applications and the Internet, 2007

The TSUBAME Cluster Experience a Year Later, and onto Petascale TSUBAME 2.0.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

A Decentralized, Scalable, and Autonomous Grid Monitoring System.
Proceedings of the Principles of Distributed Systems, 11th International Conference, 2007

ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A Peer-to-Peer Infrastructure for Autonomous Grid Monitoring.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

<i>Grid'BnB</i> : A Parallel Branch and Bound Framework for Grids.
Proceedings of the High Performance Computing, 2007

Virtual Clusters on the Fly - Fast, Scalable, and Flexible Installation.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

High-Performance MPI Broadcast Algorithm for Grid Environments Utilizing Multi-lane NICs.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2006
Design and Implementation of NAREGI SuperScheduler Based on the OGSA Architecture.
J. Comput. Sci. Technol., 2006

Teddy: a sketching interface for 3D freeform design.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2006

Making Wide-Area, Multi-site MPI Feasible Using Xen VM.
Proceedings of the Frontiers of High Performance Computing and Networking, 2006

Profile-based optimization of power performance by using dynamic voltage scaling on a PC cluster.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

MegaProto/E: power-aware high-performance cluster with commodity technology.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Multi-Replication with Intelligent Staging in Data-Intensive Grid Applications.
Proceedings of the 7th IEEE/ACM International Conference on Grid Computing (GRID 2006), 2006

2005
Japanese Computational Grid Research Project: NAREGI.
Proc. IEEE, 2005

MegaProto: 1 TFlops/10kW Rack Is Feasible Even with Only Commodity Technology.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

A Scalable Multi-Replication Framework for Data Grid.
Proceedings of the 2005 IEEE/IPSJ International Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops), 31 January, 2005

MegaProto: A Low-Power and Compact Cluster for High-Performance Computing.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

2004
P-time Completeness of Light Linear Logic and its Nondeterministic Extension
CoRR, 2004

Weak Typed Boehm Theorem on IMLL
CoRR, 2004

Nondeterministic Linear Logic
CoRR, 2004

The Second Trans-Pacific Grid Datafarm Testbed and Experiments for SC2003.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

Autonomous Con.guration of Grid Monitoring Systems.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

Parallelization of Phylogenetic Tree Inference Using Grid Technologies.
Proceedings of the Grid Computing in Life Science, 2004

Grid Portal Interface for Interactive Use and Monitoring of High-Throughput Proteome Annotation.
Proceedings of the Grid Computing in Life Science, 2004

A Java-based programming environment for hierarchical Grid: Jojo.
Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

Application-Level Tools.
Proceedings of the Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition, 2004

2003
Ninf-G: A Reference Implementation of RPC-based Programming Middleware for Grid Computing.
J. Grid Comput., 2003

Worldwide Fast File Replication on Grid Datafarm
CoRR, 2003

Building A High Performance Parallel File System Using Grid Datafarm and ROOT I/O
CoRR, 2003

A Foundation of Solution Methods for Constraint Hierarchies.
Constraints An Int. J., 2003

Performance Analysis of Scheduling and Replication Algorithms on Grid Datafarm Architecture for High-Energy Physics Applications.
Proceedings of the 12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 2003

Towards a Petascale Research Grid Infrastructure.
Proceedings of the Grid and Cooperative Computing, Second International Workshop, 2003

Preliminary Evaluation of Dynamic Load Balancing Using Loop Re-partitioning on Omni/SCASH.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

Evaluation of the inter-cluster data transfer on Grid environment.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2002
The ninf portal: an automatic generation tool for grid portals.
Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande 2002, 2002

Evaluating Web Services Based Implementations of GridRPC.
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002), 2002

Overview of GridRPC: A Remote Procedure Call API for Grid Computing.
Proceedings of the Grid Computing, 2002

First Light of the Earth Simulator and Its PC Cluster Applications.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

Grid Datafarm Architecture for Petascale Data Intensive Computing.
Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), 2002

2001
Towards performance evaluation of high-performance computing on multiple Java platforms.
Future Gener. Comput. Syst., 2001

On intuitionistic proof nets with additional rewrite rules and their approximations.
Proceedings of the Bohm's theorem: applications to Computer Science Theory, 2001

A Jini-based computing portal system.
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

A Confluent Extension of Lafont's Interaction Nets.
Proceedings of the Programmation en logique avec contraintes, Actes des JFPLC'2001, 24 avril, 2001

Implementation of a portable software DSM in Java.
Proceedings of the ACM 2001 Java Grande Conference, Stanford University, California, USA, 2001

An Evaluation of Multiple Pointing Input Systems.
Proceedings of the Human-Computer Interaction INTERACT '01: IFIP TC13 International Conference on Human-Computer Interaction, 2001

A Study of Deadline Scheduling for Client-Server Systems on the Computational Grid.
Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10 2001), 2001

MPC++ Performance for Commodity Clustering.
Proceedings of the High-Performance Computing and Networking, 9th International Conference, 2001

Grid RPC meets Data Grid: Network Enabled Services for Data Farming on the Grid.
Proceedings of the First IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2001), 2001

2000
Performance Evaluation Model for Scheduling in Global Computing Systems.
Int. J. High Perform. Comput. Appl., 2000

AJaPACK: experiments in performance portable parallel Java numerical libraries.
Proceedings of the ACM 2000 Java Grande Conference, San Francisco, CA, USA, 2000

Are Global Computing Systems Useful? Comparison of Client-server Global Computing Systems Ninf, NetSolve Versus CORB.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Design Issues of Network Enabled Server Systems for the Grid.
Proceedings of the Grid Computing, 2000

OpenJIT: An Open-Ended, Reflective JIT Compiler Framework for Java.
Proceedings of the ECOOP 2000, 2000

1999
OMPC++ - A Portable High-Performance Implementation of DSM using OpenC++ Reflection.
Proceedings of the Meta-Level Architectures and Reflection, 1999

OpenJIT Frontend System: An Implementation of the Reflective JIT Compiler Frontend.
Proceedings of the Reflection and Software Engineering, 1999

Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms.
Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing, 1999

1998
Ninf and PM: Communication libraries for global computing and high-performance cluster computing.
Future Gener. Comput. Syst., 1998

A Constraint-Based Approach for Visualization and Animation.
Constraints An Int. J., 1998

Ninflet: a migratable parallel objects framework using Java.
Concurr. Pract. Exp., 1998

Popup Vernier: A Tool for Sub-Pixel-Pitch Dragging with Smooth Mode Transition.
Proceedings of the 11th Annual ACM Symposium on User Interface Software and Technology, 1998

Towards a Parallel Programming Language based on Commodity Object-Oriented Technologies.
Proceedings of the International Symposium on Software Engineering for Parallel and Distributed Systems, 1998

A Performance Evaluation Model for Effective Job Scheduling in Global Computing Systems.
Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, 1998

Utilizing the Metaserver Architecture in the Ninf Global Computing System.
Proceedings of the High-Performance Computing and Networking, 1998

Is Java Suitable for Portable High-Performance Computing?
Proceedings of the Object-Oriented Technology, ECOOP'98 Workshop Reader, 1998

Pegasus: a drawing system for rapid geometric design.
Proceedings of the CHI 98 Conference Summary on Human Factors in Computing Systems, 1998

Reduction of Overhead in Drawing Figures with Computer: Detailed Analyses of Drawing Tasks.
Proceedings of the Third Asian Pacific Computer and Human Interaction, 1998

Layered Penumbrae: An Effective 3D Feedback Technique.
Proceedings of the Third Asian Pacific Computer and Human Interaction, 1998

1997
Supporting Design Patterns in a Visual Parallel Data-flow Programming Environment.
Proceedings of the Proceedings 1997 IEEE Symposium on Visual Languages, 1997

Interactive Beautification: A Technique for Rapid Geometric Design.
Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology, 1997

Multi-client LAN/WAN Performance Analysis of Ninf: a High-Performance Global Computing System.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

Towards a Parallel C++ Programming Language Based on Commodity Object-Oriented Technologies.
Proceedings of the Scientific Computing in Object-Oriented Parallel Environments, 1997

In Search for an Ideal Computer-Assisted Drawing System.
Proceedings of the Human-Computer Interaction, 1997

A Methodology for Specifying Data Distribution Using Only Standard Object-Oriented Features.
Proceedings of the 11th international conference on Supercomputing, 1997

Ninf: A Network Based Information Library for Global World-Wide Computing Infrastructure.
Proceedings of the High-Performance Computing and Networking, 1997

1996
Duplication and Partial Evaluation For a Better Understanding of Reflective Languages.
LISP Symb. Comput., 1996

Penumbrae for 3D Interactions.
Proceedings of the 9th Annual ACM Symposium on User Interface Software and Technology, 1996

OMPI: Optimizing MPI Programs using Partial Evaluation.
Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996

Hierarchical Collections: An Efficient Scheme to Build an Obeject-Oriented Distributed Class Library for Massively Parallel Computation.
Proceedings of the Object Technologies for Advanced Software, 1996

A New Proof Search Method for Linear Logic.
Proceedings of the First Conference of the Centre for Discrete Mathematics and Theoretical Computer Science, 1996

Generalized Local Propagation: A Framework for Solving Constraint Hierarchies.
Proceedings of the Second International Conference on Principles and Practice of Constraint Programming, 1996

1995
Adaptive Recognition of Implicit Structures in Human-Organized Layouts.
Proceedings of the Proceedings 11th International IEEE Symposium on Visual Languages, 1995

Compiling Away the Meta-Level in Object-Oriented Concurrent Reflective Languages Using Partial Evaluation.
Proceedings of the Tenth Annual Conference on Object-Oriented Programming Systems, 1995

1994
A Framework for Constructing Animations via Declarative Mapping Rules.
Proceedings of the Proceedings IEEE Symposium on Visual Languages, 1994

Interactive Generation of Graphical User Interfaces by Multiple Visual Examples.
Proceedings of the 7th Annual ACM Symposium on User Interface Software and Technology, 1994

StackThreads: An Abstract Machine for Scheduling Fine-Grain Threads on Stock CPUs.
Proceedings of the Theory and Practice of Parallel Programming, 1994

An Algorithm for Efficient Global Garbage Collection on Massively Parallel Computers.
Proceedings of the Theory and Practice of Parallel Programming, 1994

Efficient parallel global garbage collection on massively parallel computers.
Proceedings of the Proceedings Supercomputing '94, 1994

Locally Simultaneous Constraint Satisfaction.
Proceedings of the Principles and Practice of Constraint Programming, 1994

PARCS: An MPP-Oriented CLP Language.
Proceedings of the First International Symposium on Parallel Symbolic Computation, 1994

Comprehensive operating system for highly parallel machine.
Proceedings of the International Symposium on Parallel Architectures, 1994

The Plan-Du Style Compilation Technique for Eager Data Transfer in Thread-Based Execution.
Proceedings of the Parallel Architectures and Compilation Techniques, 1994

ABCL/f: A Future-Based Polymorphic Typed Concurrent Object-Oriented Language- Its Design and Implementation.
Proceedings of the Specification of Parallel Algorithms, 1994

1993
Implementing concurrent object-oriented languages on multicomputers.
IEEE Parallel Distributed Technol. Syst. Appl., 1993

Highly Efficient and Encapsulated Re-use of Synchronization Code in Concurrent Object-Oriented Languages.
Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, 1993

1992
A General Framework for Bidirectional Translation between Abstract and Pictorial Data.
ACM Trans. Inf. Syst., 1992

Creating Visual Objects by Direct Manipulation.
Proceedings of the 1992 IEEE Workshop on Visual Languages, 1992

Animation for on-Line Documents-an End-User System Using Object-Oriented Constraints.
Proceedings of the 1992 IEEE Workshop on Visual Languages, 1992

Declarative Programming of Graphical Interfaces by Visual Examples.
Proceedings of the Fifth ACM Symposium on User Interface Software and Technology, 1992

An Efficient Implementation Scheme of Concurrent Object-Oriented Languages on Stock Multicomputers.
Proceedings of the Parallel Symbolic Computing: Languages, 1992

Object-Oriented Concurrent Reflective Languages can be Implemented Efficiently.
Proceedings of the Seventh Annual Conference on Object-Oriented Programming Systems, 1992

ABCL/onEM-4: a new software/hardware architecture for object-oriented concurrent computing on an extended dataflow supercomputer.
Proceedings of the 6th international conference on Supercomputing, 1992

1991
A general framework for Bi-directional translation between abstract and pictorial data.
Proceedings of the 4th Annual ACM Symposium on User Interface Software and Technology, 1991

Object-Oriented Concurrent Reflective Architectures.
Proceedings of the Object-Based Concurrent Computing, 1991

Hybrid Group Reflective Architecture for Object-Oriented Concurrent Reflective Programming.
Proceedings of the ECOOP'91 European Conference on Object-Oriented Programming, 1991

1989
Asymptotic Evaluation of Window Visibility.
Inf. Process. Lett., 1989

1988
Using Tuple Space Communication in Distributed Object-Oriented Languages.
Proceedings of the Conference on Object-Oriented Programming Systems, 1988


  Loading...