Toshio Endo

Naoyuki Onodera

Takayuki Aoki

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

ExanaDBT: A Dynamic Compilation System for Transparent Polyhedral Optimizations at Runtime.

[BibT_eX]

[DOI]

Yukinori Sato

Tomoya Yuki

Proceedings of the Computing Frontiers Conference, 2017

ooc_cuDNN: Accommodating convolutional neural networks over GPU memory capacity.

[BibT_eX]

[DOI]

Yuki Ito

Ryo Matsumiya

Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016

PGAS Communication Runtime for Extreme Large Data Computation.

[BibT_eX]

[DOI]

Ryo Matsumiya

Proceedings of the Second International Workshop on Extreme Scale Programming Models and Middleware, 2016

Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers.

[BibT_eX]

[DOI]

Katsuki Fujisawa

Yuichiro Yasui

Proceedings of the Mathematical Software - ICMS 2016, 2016

Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Evaluating the impacts of code-level performance tunings on power efficiency.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015

Power Capping of CPU-GPU Heterogeneous Systems using Power and Performance Models.

[BibT_eX]

[DOI]

Kazuki Tsuzuku

Proceedings of the SMARTGREENS 2015, 2015

The scalable petascale data-driven approach for the Cholesky factorization with multiple GPUs.

[BibT_eX]

[DOI]

Yuki Tsujita

Katsuki Fujisawa

Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, 2015

Investigating potential performance benefits of memory layout optimization based on roofline model.

[BibT_eX]

[DOI]

Shimpei Sato

Yukinori Sato

Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems, 2015

Exana: an execution-driven application analysis tool for assisting productive performance tuning.

[BibT_eX]

[DOI]

Yukinori Sato

Shimpei Sato

Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems, 2015

Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition.

[BibT_eX]

[DOI]

Yuki Tsujita

Proceedings of the Job Scheduling Strategies for Parallel Processing, 2015

Exploration of Lossy Compression for Application-Level Checkpoint/Restart.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers.

[BibT_eX]

[DOI]

Yuki Takasaki

Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

2014

Special Issue on Applications for the Heterogeneous Computing Era.

[BibT_eX]

[DOI]

Jiayuan Meng

Int. J. High Perform. Comput. Appl., 2014

Petascale General Solver for Semidefinite Programming Problems with Over Two Million Constraints.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

TSUBAME-KFC: A modern liquid submersion cooling prototype towards exascale becoming the greenest supercomputer in the world.

[BibT_eX]

[DOI]

Akira Nukada

Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

An evaluation of the potential of flash SSD as large and slow memory for stencil computations.

[BibT_eX]

[DOI]

Hiroko Midorikawa

Hideyuki Tan

Proceedings of the International Conference on High Performance Computing & Simulation, 2014

Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations.

[BibT_eX]

[DOI]

Guanghao Jin

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013

A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU.

[BibT_eX]

[DOI]

Guanghao Jin

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

A parallel optimization method for stencil computation on the domain that is bigger than memory capacity of GPUs.

[BibT_eX]

[DOI]

Guanghao Jin

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2012

High-performance general solver for extremely large-scale semidefinite programming problems.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

2011

Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

Petaflop biofluidics simulations on a two million-core system.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

2010

An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Linpack evaluation on a supercomputer with heterogeneous accelerators.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Statistical power modeling of GPU kernels using performance counters.

[BibT_eX]

[DOI]

Proceedings of the International Green Computing Conference 2010, 2010

2009

Power-aware dynamic task scheduling for heterogeneous accelerated clusters.

[BibT_eX]

[DOI]

Tomoaki Hamano

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

File Clustering Based Replication Algorithm in a Grid Environment.

[BibT_eX]

[DOI]

Hitoshi Sato

Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

2008

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Locality aware MPI communication on a commodity opto-electronic hybrid network.

[BibT_eX]

[DOI]

Shin'ichiro Takizawa

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

An efficient, model-based CPU-GPU heterogeneous FFT library.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method.

[BibT_eX]

[DOI]

Y. Hosogaya

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Massive supercomputing coping with heterogeneity of modern accelerators.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Access-pattern and bandwidth aware file replication algorithm in a grid environment.

[BibT_eX]

[DOI]

Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

Environmental-aware optimization of MPI checkpointing intervals.

[BibT_eX]

[DOI]

Hideyuki Jitsumoto

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007

ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs.

[BibT_eX]

[DOI]

Hideyuki Jitsumoto

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

High-Performance MPI Broadcast Algorithm for Grid Environments Utilizing Multi-lane NICs.

[BibT_eX]

[DOI]

Tatsuhiro Chiba

Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2005

Highly latency tolerant Gaussian elimination.

[BibT_eX]

[DOI]

Proceedings of the 6th IEEE/ACM International Conference on Grid Computing (GRID 2005), 2005

2004

High performance LU factorization for non-dedicated clusters.

[BibT_eX]

[DOI]

Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

2003

Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

2002

Reducing pause time of conservative collectors.

[BibT_eX]

[DOI]

Proceedings of The Workshop on Memory Systems Performance (MSP 2002), 2002

2001

Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors.

[BibT_eX]

[DOI]

Akinori Yonezawa

Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

1998

On a High-Speed Hough Transform Algorithm MRHT.

[BibT_eX]

[DOI]

Proceedings of IAPR Workshop on Machine Vision Applications, 1998

1997

A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines.

[BibT_eX]

[DOI]