Ryusuke Egawa

Orcid: 0000-0001-8966-867X

According to our database1, Ryusuke Egawa authored at least 100 papers between 2001 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Special Issue on COOL Chips.
IEEE Micro, 2024

2023
Foreword.
IEICE Trans. Electron., June, 2023

Performance Evaluation of a Next-Generation SX-Aurora TSUBASA Vector Supercomputer.
Proceedings of the High Performance Computing - 38th International Conference, 2023

2022
A Conflict-Aware Capacity Control Mechanism for Deep Cache Hierarchy.
IEICE Trans. Inf. Syst., 2022

Equivalence Checking of Code Transformation by Numerical and Symbolic Approaches.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022

A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022

Toward Building a Digital Twin of Job Scheduling and Power Management on an HPC System.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2022

2021
OpenCL-like offloading with metaprogramming for SX-Aurora TSUBASA.
Parallel Comput., 2021

Preemptive Parallel Job Scheduling for Heterogeneous Systems Supporting Urgent Computing.
IEEE Access, 2021

Towards Conflict-Aware Workload Co-execution on SX-Aurora TSUBASA.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2021

Portability of Vectorization-aware Performance Tuning Expertise across System Generations.
Proceedings of the 14th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2021

2020
ExaFSA: Parallel Fluid-Structure-Acoustic Simulation.
Proceedings of the Software for Exascale Computing - SPPEXA 2016-2019, 2020

Improving Quantum Annealing Performance on Embedded Problems.
Supercomput. Front. Innov., 2020

Effects of Using a Memory Stalled Core for Handling MPI Communication Overlapping in the SOR Solver on SX-ACE and SX-Aurora TSUBASA.
Supercomput. Front. Innov., 2020

Online MPI Process Mapping for Coordinating Locality and Memory Congestion on NUMA Systems.
Supercomput. Front. Innov., 2020

Xevolver: A code transformation framework for separation of system-awareness from application codes.
Concurr. Comput. Pract. Exp., 2020

DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems.
IEEE Access, 2020

Exploiting the Potentials of the Second Generation SX-Aurora TSUBASA.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

Task Priority Control for the HPX Runtime System.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Automatically Avoiding Memory Access Conflicts on SX-Aurora TSUBASA.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

A Conflict-Aware Capacity Control Mechanism for Last-Level Cache.
Proceedings of the Eighth International Symposium on Computing and Networking Workshops, 2020

Improving the Accuracy in SpMV Implementation Selection with Machine Learning.
Proceedings of the Eighth International Symposium on Computing and Networking Workshops, 2020

Polymorphic Data Layout for SX-Aurora TSUBASA Vector Engines.
Proceedings of the Eighth International Symposium on Computing and Networking, 2020

2019
Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines.
Supercomput. Front. Innov., 2019

An Energy-aware Dynamic Data Allocation Mechanism for Many-channel Memory Systems.
Supercomput. Front. Innov., 2019

Peachy Parallel Assignments (EduHPC 2019).
Proceedings of the 2019 IEEE/ACM Workshop on Education for High-Performance Computing, 2019

An OpenCL-Like Offload Programming Framework for SX-Aurora TSUBASA.
Proceedings of the 20th International Conference on Parallel and Distributed Computing, 2019

An Automatic MPI Process Mapping Method Considering Locality and Memory Congestion on NUMA Systems.
Proceedings of the 13th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2019

A Layer-Adaptable Cache Hierarchy by a Multiple-layer Bypass Mechanism.
Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2019

Performance Improvement of High-Speed File Transfer Over JHPCN.
Proceedings of the 2019 IEEE Intl Conf on Dependable, 2019

The Impacts of Locality and Memory Congestion-aware Thread Mapping on Energy Consumption of Modern NUMA Systems.
Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2019

A Design Scheme for 3-D Stacked CNN Accelerators.
Proceedings of the 2019 International 3D Systems Integration Conference (3DIC), 2019

2018
An Adjacent-Line-Merging Writeback Scheme for STT-RAM-Based Last-Level Caches.
IEEE Trans. Multi Scale Comput. Syst., 2018

Risk Management of Heatstroke Based on Fast Computation of Temperature and Water Loss Using Weather Data for Exposure to Ambient Heat and Solar Radiation.
IEEE Access, 2018

Use of Code Structural Features for Machine Learning to Predict Effective Optimizations.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Investigating the Effects of Dynamic Thread Team Size Adjustment for Irregular Applications.
Proceedings of the Sixth International Symposium on Computing and Networking, 2018

An energy-aware set-level refreshing mechanism for eDRAM last-level caches.
Proceedings of the 2018 IEEE Symposium in Low-Power and High-Speed Chips, 2018

A Failure Prediction-Based Adaptive Checkpointing Method with Less Reliance on Temperature Monitoring for HPC Applications.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

Automatic Hyperparameter Tuning of Machine Learning Models under Time Constraints.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

2017
Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE.
J. Supercomput., 2017

A Directive Generation Approach to High Code-Maintainability for Various HPC Systems.
Int. J. Netw. Comput., 2017

An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning.
Proceedings of the Fifth International Symposium on Computing and Networking, 2017

Designing an Open Database of System-Aware Code Optimizations.
Proceedings of the Fifth International Symposium on Computing and Networking, 2017

A Memory Congestion-Aware MPI Process Placement for Modern NUMA Systems.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

An Adaptive Demotion Policy for High-Associativity Caches.
Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2017

An application-adaptive data allocation method for multi-channel memory.
Proceedings of the 2017 IEEE Symposium in Low-Power and High-Speed Chips, 2017

An Adjacent-Line-Merging Writeback Scheme for STT-RAM last-level caches.
Proceedings of the 2017 IEEE Symposium in Low-Power and High-Speed Chips, 2017

Vectorization-Aware Loop Optimization with User-Defined Code Transformations.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Performance and Power Analysis of SX-ACE Using HP-X Benchmark Programs.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Effects of Stacking Granularity on 3-D Stacked Floating-point Fused Multiply Add Units.
SIGARCH Comput. Archit. News, 2016

A Memory-Efficient Implementation of a Plasmonics Simulation Application on SX-ACE.
Int. J. Netw. Comput., 2016

Translation of Large-Scale Simulation Codes for an OpenACC Platform Using the Xevolver Framework.
Int. J. Netw. Comput., 2016

A Directive Generation Approach Using User-Defined Rules.
Proceedings of the Fourth International Symposium on Computing and Networking, 2016

A cache partitioning mechanism to protect shared data for CMPs.
Proceedings of the 2016 IEEE Symposium in Low-Power and High-Speed Chips, 2016

A power-aware LLC control mechanism for the 3D-stacked memory system.
Proceedings of the 2016 IEEE International 3D Systems Integration Conference, 2016

2015
FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms.
IEICE Trans. Electron., 2015

A Case Study of Memory Optimization for Migration of a Plasmonics Simulation Application to SX-ACE.
Proceedings of the Third International Symposium on Computing and Networking, 2015

Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework.
Proceedings of the Third International Symposium on Computing and Networking, 2015

An energy-efficient dynamic memory address mapping mechanism.
Proceedings of the 2015 IEEE Symposium in Low-Power and High-Speed Chips, 2015

Design of a 3-D stacked floating-point Goldschmidt divider.
Proceedings of the 2015 International 3D Systems Integration Conference, 2015

2014
MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications.
IEICE Trans. Inf. Syst., 2014

A Compiler-Assisted OpenMP Migration Method Based on Automatic Parallelizing Information.
Proceedings of the Supercomputing - 29th International Conference, 2014

Xevolver: An XML-based code translation framework for supporting HPC application migration.
Proceedings of the 21st International Conference on High Performance Computing, 2014

An energy optimization method for vector processing mechanisms.
Proceedings of the 2014 IEEE Symposium on Low-Power and High-Speed Chips, 2014

An impact of circuit scale on the performance of 3-D stacked arithmetic units.
Proceedings of the 2014 International 3D Systems Integration Conference, 2014

On-chip checkpointing with 3D-stacked memories.
Proceedings of the 2014 International 3D Systems Integration Conference, 2014

2013
A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts.
IEICE Trans. Inf. Syst., 2013

Design and evaluation of a media-oriented vector processor with a multi-banked cache memory.
Proceedings of the 11th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2013

A flexible insertion policy for dynamic cache resizing mechanisms.
Proceedings of the 2013 IEEE Symposium on Low-Power and High-Speed Chips, 2013

Design of a 3-D stacked floating-point adder.
Proceedings of the 2013 IEEE International 3D Systems Integration Conference (3DIC), 2013

Vertically integrated processor and memory module design for vector supercomputers.
Proceedings of the 2013 IEEE International 3D Systems Integration Conference (3DIC), 2013

2012
Poster: Exploring Design Space of a 3D Stacked Vector Cache - Designing a 3D Stacked Vector Cache using Conventional EDA Tools.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Exploring Design Space of a 3D Stacked Vector Cache.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

A media-oriented vector architectural extension with a high bandwidth cache system.
Proceedings of the 2012 IEEE Symposium on Low-Power and High-Speed Chips, 2012

A capacity-efficient insertion policy for dynamic cache resizing mechanisms.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

An out-of-order vector processing mechanism for multimedia applications.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011
Power-Aware Dynamic Cache Partitioning for CMPs.
Trans. High Perform. Embed. Archit. Compil., 2011

A Network Clustering Algorithm for Sybil-Attack Resisting.
IEICE Trans. Inf. Syst., 2011

A middle-grain circuit partitioning strategy for 3-D integrated floating-point multipliers.
Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC), Osaka, Japan, January 31, 2011

Effects of 3-D stacked vector cache on energy consumption.
Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC), Osaka, Japan, January 31, 2011

2010
A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering.
IEICE Trans. Inf. Syst., 2010

A History-Based Job Scheduling Mechanism for the Vector Computing Cloud.
Proceedings of the Tenth Annual International Symposium on Applications and the Internet, 2010

A voting-based working set assessment scheme for dynamic cache resizing mechanisms.
Proceedings of the 28th International Conference on Computer Design, 2010

A Majority-Based Control Scheme for Way-Adaptable Caches.
Proceedings of the Facing the Multicore-Challenge, 2010

A Load-Forwarding Mechanism for the Vector Architecture in Multimedia Applications.
Proceedings of the 13th Euromicro Conference on Digital System Design, 2010

Cache partitioning strategies for 3-D stacked vector processors.
Proceedings of the IEEE International Conference on 3D System Integration, 2010

Design and early evaluation of a 3-D die stacked chip multi-vector processor.
Proceedings of the IEEE International Conference on 3D System Integration, 2010

2009
Performance evaluation of NEC SX-9 using real science and engineering applications.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Performance tuning and analysis of future vector processors based on the roofline model.
Proceedings of the 10th workshop on MEmory performance, 2009

3D on-chip memory for the vector architecture.
Proceedings of the IEEE International Conference on 3D System Integration, 2009

Evaluation of fine grain 3-D integrated arithmetic units.
Proceedings of the IEEE International Conference on 3D System Integration, 2009

2008
A shared cache for a chip multi vector processor.
Proceedings of the 9th workshop on MEmory performance, 2008

Modeling of cache access behavior based on Zipf's law.
Proceedings of the 9th workshop on MEmory performance, 2008

A Utility-Based Double Auction Mechanism for Efficient Grid Resource Allocation.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2008

Effects of MSHR and Prefetch Mechanisms on an On-Chip Cache of the Vector Architecture.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2008

First Experiences with NEC SX-9.
Proceedings of the High Performance Computing on Vector Systems 2008, 2008

2007
An on-chip cache design for vector processors.
Proceedings of the 2007 workshop on MEmory performance, 2007

A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs.
Proceedings of the 2007 workshop on MEmory performance, 2007

2004
A Systolic Memory Architecture for Fast Codebook Design based on MMPDCL Algorithm.
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04), 2004

2001
Scaling Up Of Wave Pipelines.
Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001


  Loading...