Peng Chen

Orcid: 0000-0003-1244-3151

Affiliations:
  • National Institute of Advanced Industrial Science and Technology, Japan, RIKEN Center for Computational Science, Tokyo, Japan
  • Tokyo Institute of Technology, AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, Japan (PhD 2020)


According to our database1, Peng Chen authored at least 41 papers between 2018 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
RT-RkNN: Reverse k Nearest Neighbor Queries as a Graphics Ray Casting Problem.
CoRR, May, 2026

FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication.
CoRR, May, 2026

Partial Decoder Attention Network with Contour-weighted Loss Function for Data-Imbalance Medical Image Segmentation.
CoRR, January, 2026

Neural architecture search for generative adversarial networks with hybrid convolution.
Neurocomputing, 2026

FRUGAL: Pushing GPU Applications beyond Memory Limits.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026

2025
SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication.
CoRR, December, 2025

Neural Architecture Search with Progressive Evaluation and Subpopulation Preservation.
IEEE Trans. Evol. Comput., October, 2025

Paradigm Shift in Infrastructure Inspection Technology: Leveraging High-performance Imaging and Advanced AI Analytics to Inspect Road Infrastructure.
CoRR, May, 2025

NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPU.
CoRR, March, 2025

Predictor-assisted evolutionary neural architecture search for spiking neural networks.
Neurocomputing, 2025

Noisy data-based attack: A new type of untargeted attack in Federated Learning and its countermeasures.
Future Gener. Comput. Syst., 2025

Antimicrobial resistance recommendations via electronic health records with graph representation and patient population modeling.
Comput. Methods Programs Biomed., 2025

A Sample-Free Compilation Framework for Efficient Dynamic Tensor Computation.
Proceedings of the International Conference for High Performance Computing, 2025

A General and Scalable GCN Training Framework on CPU Supercomputers.
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

SHF: Symmetrical Hierarchical Forest with Pretrained Vision Transformer Encoder for High-Resolution Medical Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

NM-SpMM: Accelerating Matrix Multiplication Using N: M Sparsity with GPGPU.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers.
Proceedings of the 39th ACM International Conference on Supercomputing, 2025

GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Evolutionary Architecture Search for Generative Adversarial Networks Based on Weight Sharing.
IEEE Trans. Evol. Comput., June, 2024

SuperGCN: General and Scalable Framework for GCN Training on CPU-powered Supercomputers.
CoRR, 2024

Adaptive Patching for High-resolution Image Segmentation with Transformers.
Proceedings of the International Conference for High Performance Computing, 2024

Real-time High-resolution X-Ray Computed Tomography.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Communication Optimization for Distributed GCN Training on ABCI Supercomputer.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Investigating Nvidia GPU Architecture Trends via Microbenchmarks.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Asynchronous I/O Optimization for X-Ray Imaging via GPUDirect Storage.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads.
ACM Trans. Archit. Code Optim., December, 2023

Simeuro: A Hybrid CPU-GPU Parallel Simulator for Neuromorphic Computing Chips.
IEEE Trans. Parallel Distributed Syst., October, 2023

Ultra-Long Sequence Distributed Transformer.
CoRR, 2023

Revisiting Temporal Blocking Stencil Optimizations.
Proceedings of the 37th International Conference on Supercomputing, 2023

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications.
Proceedings of the 37th International Conference on Supercomputing, 2023

Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt).
Proceedings of the 15th Workshop on General Purpose Processing Using GPU, 2023

2022
Automatic Generation of High-Performance Convolution Kernels on ARM CPUs for Deep Learning.
IEEE Trans. Parallel Distributed Syst., 2022

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache.
CoRR, 2022

Persistent Kernels for Iterative Memory-bound GPU Applications.
CoRR, 2022

Image Gradient Decomposition for Parallel and Memory-Efficient Ptychographic Reconstruction.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

2021
Scalable FBP decomposition for cone-beam CT reconstruction.
Proceedings of the International Conference for High Performance Computing, 2021

Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Performance portable back-projection algorithms on CPUs: agnostic data locality and vectorization optimizations.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2019
iFDK: a scalable framework for instant high-resolution image reconstruction.
Proceedings of the International Conference for High Performance Computing, 2019

A versatile software systolic execution model for GPU memory-bound kernels.
Proceedings of the International Conference for High Performance Computing, 2019

2018
Efficient Algorithms for the Summed Area Tables Primitive on GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2018


  Loading...