Ryan E. Grant

James H. Laros III

Stephen L. Olivier

Kevin T. Pedretti

Lee Ward

Sustain. Comput. Informatics Syst., 2019

Using simulation to examine the effect of MPI message matching costs on application performance.

[BibT_eX]

[DOI]

Parallel Comput., 2019

A dynamic, unified design for dedicated message matching engines for collective and point-to-point communications.

[BibT_eX]

[DOI]

Parallel Comput., 2019

Finepoints: Partitioned Multithreaded MPI Communication.

[BibT_eX]

[DOI]

Ron Brightwell

Anthony Skjellum

Proceedings of the High Performance Computing - 34th International Conference, 2019

INCA: in-network compute assistance.

[BibT_eX]

[DOI]

Dorian C. Arnold

Proceedings of the International Conference for High Performance Computing, 2019

MPI tag matching performance on ConnectX and ARM.

[BibT_eX]

[DOI]

W. Pepper Marts

Proceedings of the 26th European MPI Users' Group Meeting, 2019

Introduction to SNACS 2019.

[BibT_eX]

[DOI]

Taylor L. Groves

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Fuzzy Matching: Hardware Accelerated MPI Communication Middleware.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

2018

Unraveling Network-Induced Memory Contention: Deeper Insights with Machine Learning.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

A Comparison of Power Management Mechanisms: P-States vs. Node-Level Power Cap Control.

[BibT_eX]

[DOI]

Kevin T. Pedretti

James H. Laros III

Stephen L. Olivier

Lee Ward

Andrew J. Younge

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

A Dedicated Message Matching Mechanism for Collective Communications.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

Improving MPI Multi-threaded RMA Communication Performance.

[BibT_eX]

[DOI]

Nathan T. Hjelm

Proceedings of the 47th International Conference on Parallel Processing, 2018

The Case for Semi-Permanent Cache Occupancy: Understanding the Impact of Data Locality on Network Processing.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

Measuring Multithreaded Message Matching Misery.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2018: Parallel Processing, 2018

2017

sPIN: high-performance streaming processing in the network.

[BibT_eX]

[DOI]

Torsten Hoefler

Salvatore Di Girolamo

Konstantin Taranov

Ron Brightwell

Proceedings of the International Conference for High Performance Computing, 2017

Characterizing MPI matching via trace-based simulation.

[BibT_eX]

[DOI]

Proceedings of the 24th European MPI Users' Group Meeting, 2017

Evaluating energy and power profiling techniques for HPC workloads.

[BibT_eX]

[DOI]

James H. Laros III

Proceedings of the Eighth International Green and Sustainable Computing Conference, 2017

Enabling Diverse Software Stacks on Supercomputers Using High Performance Virtual Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cloud Computing Technology and Science, 2017

2016

Program optimizations: The interplay between power, performance, and energy.

[BibT_eX]

[DOI]

Edgar A. León

Ian Karlin

Parallel Comput., 2016

Hot Interconnects 23.

[BibT_eX]

[DOI]

Ada Gavrilovska

IEEE Micro, 2016

Standardizing Power Monitoring and Control at Exascale.

[BibT_eX]

[DOI]

Computer, 2016

MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale.

[BibT_eX]

[DOI]

Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

SHMEM-MT: A Benchmark Suite for Assessing Multi-threaded SHMEM Performance.

[BibT_eX]

[DOI]

Hans Weeks

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

NiMC: Characterizing and Eliminating Network-Induced Memory Contention.

[BibT_eX]

[DOI]

Taylor L. Groves

Dorian C. Arnold

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Overcoming Challenges in Scalable Power Monitoring with the Power API.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

MPI Performance Characterization on InfiniBand with Fine-Grain Multithreaded Communication.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

(SAI) Stalled, Active and Idle: Characterizing Power and Performance of Large-Scale Dragonfly Networks.

[BibT_eX]

[DOI]

Dorian C. Arnold

Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

RMA-MT: A Benchmark Suite for Assessing MPI Multi-threaded RMA Performance.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015

Scalable Network Communication Using Unreliable RDMA.

[BibT_eX]

[DOI]

Proceedings of the Handbook on Data Centers, 2015

Scalable connectionless RDMA over unreliable datagrams.

[BibT_eX]

[DOI]

Parallel Comput., 2015

Overtime: a tool for analyzing performance variation due to network interference.

[BibT_eX]

[DOI]

Kevin T. Pedretti

Ann C. Gentile

Proceedings of the 3rd Workshop on Exascale MPI, 2015

Preparing for exascale: modeling MPI for many-core systems using fine-grain queues.

[BibT_eX]

[DOI]

Proceedings of the 3rd Workshop on Exascale MPI, 2015

Toward an evolutionary task parallel integrated MPI + X programming model.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

Optimizing Explicit Hydrodynamics for Power, Energy, and Performance.

[BibT_eX]

[DOI]

Edgar A. León

Ian Karlin

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Re-evaluating Network Onload vs. Offload for the Many-Core Era.

[BibT_eX]

[DOI]

Ron Brightwell

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

Enabling communication concurrency through flexible MPI endpoints.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2014

An evaluation of MPI message rate on hybrid-core processors.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2014

Early experiences co-scheduling work and communication tasks for hybrid MPI+X applications.

[BibT_eX]

[DOI]

Proceedings of the 2014 Workshop on Exascale MPI, 2014

Energy Consumption of Resilience Mechanisms in Large Scale Systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd Euromicro International Conference on Parallel, 2014

Metrics for Evaluating Energy Saving Techniques for Resilient HPC Systems.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

2013

Evaluating energy savings for checkpoint/restart.

[BibT_eX]

[DOI]

Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, 2013

Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

2011

RDMA Capable iWARP over Datagrams.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010

A study of hardware assisted IP over InfiniBand and its impact on enterprise data center performance.

[BibT_eX]

[DOI]

Pavan Balaji

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

iWARP redefined: Scalable connectionless communication over high-speed Ethernet.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on High Performance Computing, 2010

2009

Improving energy efficiency of asymmetric chip multithreaded multiprocessors through reduced OS noise scheduling.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2009

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers.

[BibT_eX]

[DOI]

Pavan Balaji

Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

2008

An Analysis of QoS Provisioning for Sockets Direct Protocol vs. IPoIB over Modern InfiniBand Networks.

[BibT_eX]

[DOI]

Mohammad J. Rashti

Proceedings of the 37th International Conference on Parallel Processing, 2008

2007

A Comprehensive Analysis of OpenMP Applications on Dual-Core Intel Xeon SMPs.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Improving system efficiency through scheduling and power management.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006

Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

1993

NCSA <i>mosaic</i> 1993.

[BibT_eX]

[DOI]