Sreeram Potluri

J. Parallel Distributed Comput., 2018

Designing High-Performance In-Memory Key-Value Operations with Persistent GPU Kernels and OpenSHMEM.

[BibT_eX]

[DOI]

Ching-Hsiang Chu

Anshuman Goswami

Neena Imam

Chris J. Newburn

Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

2017

Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM.

[BibT_eX]

[DOI]

Anshuman Goswami

Neena Imam

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling.

[BibT_eX]

[DOI]

Proceedings of the 46th International Conference on Parallel Processing, 2017

GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM.

[BibT_eX]

[DOI]

Neena Imam

Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

Offloading communication control logic in GPU accelerated applications.

[BibT_eX]

[DOI]

Elena Agostini

Davide Rossetti

Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2015

Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

UCX: An Open Source Framework for HPC Network APIs and Beyond.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

2014

GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2014

Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

High Performance Alltoall and Allgather Designs for InfiniBand MIC Clusters.

[BibT_eX]

[DOI]

Akshay Venkatesh

Raghunath Rajachandrasekar

Miao Luo

Khaled Hamidouche

Raghunath Rajachandrasekar

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

MIC-Check: a distributed check pointing framework for the intel many integrated cores architecture.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

Scalable Graph500 design with MPI-3 RMA.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

High performance OpenSHMEM for Xeon Phi clusters: Extensions, runtime designs and application co-design.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013

Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

Efficient and truly passive MPI-3 RMA using InfiniBand atomics.

[BibT_eX]

[DOI]

Proceedings of the 20th European MPI Users's Group Meeting, 2013

Extending OpenSHMEM for GPU Computing.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand.

[BibT_eX]

[DOI]

Khaled Hamidouche

Proceedings of the International Conference on Supercomputing, 2013

Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Efficient Intra-node Communication on Intel-MIC Clusters.

[BibT_eX]

[DOI]

Akshay Venkatesh

Devendar Bureddy

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012

Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2011

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2011

Codesign for InfiniBand Clusters.

[BibT_eX]

[DOI]

Sayantan Sur

Karen Tomko

Computer, 2011

Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Design and Implementation of Key Proposed MPI-3 One-Sided Communication Semantics on InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit.

[BibT_eX]

[DOI]

Ashish Kumar Singh

Hao Wang

Sayantan Sur

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010

Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Supercomputing, 2010

High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2.

[BibT_eX]

[DOI]

Miao Luo

Ping Lai

Emilio Pasquale Mancini

Sayantan Sur