Akshay Venkatesh

Raghunath Raja Chandrasekar

Exp. Math., 2019

2017

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling.

[BibT_eX]

[DOI]

Proceedings of the 46th International Conference on Parallel Processing, 2017

2016

CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters.

[BibT_eX]

[DOI]

Parallel Comput., 2016

Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications.

[BibT_eX]

[DOI]

Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015

Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 30th International Conference, 2015

A case for application-oblivious energy-efficient MPI runtime.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks.

[BibT_eX]

[DOI]

Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters.

[BibT_eX]

[DOI]

Khaled Hamidouche

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Non-Blocking PMI Extensions for Fast MPI Startup.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014

Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 29th International Conference, 2014

A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

High Performance Alltoall and Allgather Designs for InfiniBand MIC Clusters.

[BibT_eX]

[DOI]

Raghunath Rajachandrasekar

Sreeram Potluri

Miao Luo

Khaled Hamidouche

Raghunath Rajachandrasekar

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Optimizing Collective Communication in UPC.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

MIC-Check: a distributed check pointing framework for the intel many integrated cores architecture.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

2013

MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters.

[BibT_eX]

[DOI]

Hari Subramoni

Proceedings of the International Conference for High Performance Computing, 2013

Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

Efficient Intra-node Communication on Intel-MIC Clusters.

[BibT_eX]

[DOI]

Sreeram Potluri

Devendar Bureddy

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012

MPI-based parallel synchronous vector evaluated particle swarm optimization for multi-objective design optimization of composite structures.

[BibT_eX]

[DOI]

S. N. Omkar