Kevin J. Barker

Draguna L. Vrabie

Gokcen Kestor

IEEE Internet Comput., 2023

The Landscape of Modern Machine Learning: A Review of Machine, Distributed and Federated Learning.

[BibT_eX]

[DOI]

CoRR, 2023

MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications.

[BibT_eX]

[DOI]

Siva Kumar Sastry Hari

Timothy Tsai

Ignacio Laguna

Dingwen Tao

Ganesh Gopalakrishnan

Prashant J. Nair

CoRR, 2023

MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Evaluating Energy Efficiency of GPUs using Machine Learning Benchmarks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Assessing Risk in High Performance Computing Attacks.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Information Systems Security and Privacy, 2023

Finding Your Niche: An Evolutionary Approach to HPC Topologies.

[BibT_eX]

[DOI]

Proceedings of the IEEE High Performance Extreme Computing Conference, 2023

Denial of Service Attack Detection via Differential Analysis of Generalized Entropy Progressions.

[BibT_eX]

[DOI]

Omer Subasi

Joseph B. Manzano

Proceedings of the IEEE International Conference on Cyber Security and Resilience, 2023

2022

Direction-optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental Analysis.

[BibT_eX]

[DOI]

Xu T. Liu

Andrew Lumsdaine

Assefaw Hadish Gebremedhin

ACM J. Exp. Algorithmics, 2022

MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems.

[BibT_eX]

[DOI]

CoRR, 2022

Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms.

[BibT_eX]

[DOI]

CoRR, 2022

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAs.

[BibT_eX]

[DOI]

Cheng Tan

Nicolas Bohm Agostini

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Towards Precision-Aware Fault Tolerance Approaches for Mixed-Precision Applications.

[BibT_eX]

[DOI]

Bo Fang

Siva Kumar Sastry Hari

Timothy Tsai

Xinyi Li

Ganesh Gopalakrishnan

Ignacio Laguna

Proceedings of the 12th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2022

2021

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications.

[BibT_eX]

[DOI]

Cheng Tan

Tong Geng

Chenhao Xie

Nicolas Bohm Agostini

Proceedings of the 39th IEEE International Conference on Computer Design, 2021

AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays.

[BibT_eX]

[DOI]

Cheng Tan

Nicolas Bohm Agostini

Jeff Zhang

Marco Minutoli

Vito Giovanni Castellana

Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

2020

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.

[BibT_eX]

[DOI]

CoRR, 2020

A parallel sparse tensor benchmark suite on CPUs and GPUs.

[BibT_eX]

[DOI]

Jiajia Li

Mahesh Lakshminarasimhan

Xiaolong Wu

Catherine Olschanowsky

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

A Sparse Tensor Benchmark Suite for CPUs and GPUs.

[BibT_eX]

[DOI]

Jiajia Li

Mahesh Lakshminarasimhan

Xiaolong Wu

Catherine Olschanowsky

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Detecting Anomalous Computation with RNNs on GPU-Accelerated HPC Machines.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

OpenCGRA: An Open-Source Unified Framework for Modeling, Testing, and Evaluating CGRAs.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Computer Design, 2020

On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Direction-optimizing label propagation and its application to community detection.

[BibT_eX]

[DOI]

Xu T. Liu

Andrew Lumsdaine

Assefaw H. Gebremedhin

Proceedings of the 17th ACM International Conference on Computing Frontiers, 2020

Indicator-Directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

2019

PASTA: a parallel sparse tensor algorithm benchmark suite.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., 2019

BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

Fingerprinting Anomalous Computation with RNN for GPU-accelerated HPC Machines.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Efficient and effective sparse tensor reordering.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing, 2019

Distributed Direction-Optimizing Label Propagation for Community Detection.

[BibT_eX]

[DOI]

Xu Liu

Jesun Sahariar Firoz

Marcin Zalewski

Andrew Lumsdaine

Assefaw H. Gebremedhin

Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

2018

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Warp-Consolidation: A Novel Execution Model for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

Optimizing Distributed Data-Intensive Workflows.

[BibT_eX]

[DOI]

Ryan D. Friese

Nathan R. Tallent

Malachi Schram

Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017

Verification of the Extended Roofline Model for Asynchronous Many Task Runtimes.

[BibT_eX]

[DOI]

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware, 2017

Designing Scalable Distributed Memory Models: A Case Study.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, 2017

2016

Assessing Advanced Technology in CENATE.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Networking, 2016

Modeling the Impact of Silicon Photonics on Graph Analytics.

[BibT_eX]

[DOI]

Nathan R. Tallent

Daniel G. Chavarría-Miranda

Antonino Tumeo

Andrés Márquez

Adolfy Hoisie

Proceedings of the IEEE International Conference on Networking, 2016

Modeling the Performance and Energy Impact of Dynamic Power Steering.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

LSPP Introduction and Committees.

[BibT_eX]

[DOI]

Christopher D. Carothers

Eric Van Hensbergen

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015

Towards efficient scheduling of data intensive high energy physics workflows.

[BibT_eX]

[DOI]

Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science, 2015

2014

A performance comparison of current HPC systems: Blue Gene/Q, Cray XE6 and InfiniBand systems.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2014

On the feasibility of dynamic power steering.

[BibT_eX]

[DOI]

Eric Anger

Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, 2014

MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures.

[BibT_eX]

[DOI]

Amanda Peters Randles

Guangwen Yang

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013

Designing energy efficient communication runtime systems: a view from PGAS models.

[BibT_eX]

[DOI]

J. Supercomput., 2013

A Performance Analysis of Three Generations of Blue gene.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2013

Tracking the Performance Evolution of Blue Gene Systems.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Unified performance and power modeling of scientific workloads.

[BibT_eX]

[DOI]

Shuaiwen Leon Song

Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, 2013

Building Scalable PGAS Communication Subsystem on Blue Gene/Q.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

2012

Comparing the Performance of Blue Gene/Q with Leading Cray XE6 and InfiniBand Systems.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

2011

Modeling the Performance of Direct numerical Simulation on Parallel Systems.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2011

Codesign Challenges for Exascale Systems: Performance, Power, and Reliability.

[BibT_eX]

[DOI]

Computer, 2011

An early performance analysis of POWER7-IH HPC systems.

[BibT_eX]

[DOI]

Adolfy Hoisie

Proceedings of the Conference on High Performance Computing Networking, 2011

A Performance Model of Direct Numerical Simulation for Analyzing Large-Scale Systems.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Energy Templates: Exploiting Application Information to Save Energy.

[BibT_eX]

[DOI]

Abhinav Vishnu

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Analyzing the Performance Bottlenecks of the POWER7-IH Network.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010

Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

2009

An MPI Performance Monitoring Interface for Cell Based Compute Nodes.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2009

Performance Prediction via Modeling: a Case Study of the ORNL Cray XT4 Upgrade.

[BibT_eX]

[DOI]

Kei Davis

Parallel Process. Lett., 2009

Using Performance Modeling to Design Large-Scale Systems.

[BibT_eX]

[DOI]

Computer, 2009

Application profiling on Cell-based clusters.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Performance modeling in action: Performance prediction of a Cray XT4 system during upgrade.

[BibT_eX]

[DOI]

Kei Davis

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008

A Performance Evaluation of the Nehalem Quad-Core Processor for Scientific Computing.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2008

0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Entering the petaflop era: the architecture and performance of Roadrunner.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Experiences in scaling scientific applications on current-generation quad-core processors.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007

Analysis of the Weather Research and Forecasting (WRF) Model on Large-Scale Systems.

[BibT_eX]

Kei Davis

Proceedings of the Parallel Computing: Architectures, 2007

Performance Analysis of an Optical Circuit Switched Network for Peta-Scale Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2007, 2007

Efficient offloading of collective communications in large-scale systems.

[BibT_eX]

[DOI]

José Carlos Sancho

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006

MPI tools and performance studies - Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

A Performance Model of the Krak Hydrodynamics Application.

[BibT_eX]

[DOI]