Manjunath Gorentla Venkata

Gil Bloch

CoRR, March, 2026

2025

GPU-Initiated Networking for NCCL.

[BibT_eX]

[DOI]

CoRR, November, 2025

Unified Collective Communication: A Unified Library for CPU, GPU, and DPU Collectives.

[BibT_eX]

[DOI]

IEEE Micro, 2025

DOCA UROM: A Vehicle for Offloading HPC and AI to DPUs.

[BibT_eX]

[DOI]

Zach Tiffany

Rohit Zambre

Yuri Shatsman

Muhammad Abu Saleh

Gil Bloch

Proceedings of the High Performance Computing, 2025

Proactive Endpoint Congestion Avoidance in UCC.

[BibT_eX]

[DOI]

Aamir Shafi

Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025

2024

Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU Collectives.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

2023

OpenSHMEM Queues: An abstraction for enhancing message rate, bandwidth utilization, and reducing tail latency in OpenSHMEM Applications.

[BibT_eX]

[DOI]

Vishwanath Venkatesan

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2021

Hot Interconnects 27.

[BibT_eX]

[DOI]

Ryan E. Grant

IEEE Micro, 2021

2020

A survey of MPI usage in the US exascale computing project.

[BibT_eX]

[DOI]

David E. Bernholdt

Swen Boehm

George Bosilca

Concurr. Comput. Pract. Exp., 2020

2019

Accelerating OpenSHMEM Collectives Using In-Network Computing Approach.

[BibT_eX]

[DOI]

Gil Bloch

Gilad Shainer

Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

2018

SharP Data Constructs: Data Constructs to Enable Data-Centric Computing.

[BibT_eX]

[DOI]

Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Oak Ridge OpenSHMEM Benchmark Suite.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

Designing High-Performance In-Memory Key-Value Operations with Persistent GPU Kernels and OpenSHMEM.

[BibT_eX]

[DOI]

Ching-Hsiang Chu

Sreeram Potluri

Anshuman Goswami

Chris J. Newburn

Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

Tracking Memory Usage in OpenSHMEM Runtimes with the TAU Performance System.

[BibT_eX]

[DOI]

Nicholas Chaimov

Sameer Shende

Allen D. Malony

Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

An Initial Implementation of Libfabric Conduit for OpenSHMEM-X.

[BibT_eX]

[DOI]

Subhadeep Bhattacharya

Shaeke Salman

Harsh Kundnani

Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

OpenSHMEM Sets and Groups: An Approach to Worksharing and Memory Management.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

SharP Unified Memory Allocator: An Intent-Based Memory Allocator for Extreme-Scale Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2018: Parallel Processing, 2018

SHMEMGraph: Efficient and Balanced Graph Processing Using One-Sided Communication.

[BibT_eX]

[DOI]

Shaeke Salman

Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017

Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM.

[BibT_eX]

[DOI]

Sreeram Potluri

Anshuman Goswami

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Performance Analysis of OpenSHMEM Applications with TAU Commander.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Evaluating Contexts in OpenSHMEM-X Reference Implementation.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Merged Requests for Better Performance and Productivity in Multithreaded OpenSHMEM.

[BibT_eX]

[DOI]

Swen Boehm

Matthew B. Baker

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Parallelizing Single Source Shortest Path with OpenSHMEM.

[BibT_eX]

[DOI]

Jeffrey A. Graves

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory.

[BibT_eX]

[DOI]

Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

SharP Hash: A High-Performing Distributed Hash for Extreme-Scale Systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

High-Performance Key-Value Store On OpenSHMEM.

[BibT_eX]

[DOI]

Ahana Roy Choudhury

Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016

A hybrid computational strategy to address WGS variant analysis in >5000 samples.

[BibT_eX]

[DOI]

Zhuoyi Huang

Navin Rustagi

Narayanan Veeraraghavan

Andrew Carroll

Richard A. Gibbs

Eric Boerwinkle

Fuli Yu

BMC Bioinform., 2016

DISP: Optimizations towards Scalable MPI Startup.

[BibT_eX]

[DOI]

Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

On Synchronisation and Memory Reuse in OpenSHMEM.

[BibT_eX]

[DOI]

Aaron Welch

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

Investigating Data Motion Power Trends to Enable Power-Efficient OpenSHMEM Implementations.

[BibT_eX]

[DOI]

Tiffany M. Mintz

Eduardo F. D'Azevedo

Chung-Hsing Hsu

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

Profiling Production OpenSHMEM Applications.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

SHMemCache: Enabling Memcached on the OpenSHMEM Global Address Model.

[BibT_eX]

[DOI]

Kunal SinghaRoy

Yue Zhu

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

Surviving Errors with OpenSHMEM.

[BibT_eX]

[DOI]

Aurélien Bouteiller

George Bosilca

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

Evaluating OpenSHMEM Explicit Remote Memory Access Operations and Merged Requests.

[BibT_eX]

[DOI]

Swen Böhm

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

OpenSHMEM-UCX: Evaluation of UCX for Implementing OpenSHMEM Programming Model.

[BibT_eX]

[DOI]

Matthew B. Baker

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

2015

From MPI to OpenSHMEM: Porting LAMMPS.

[BibT_eX]

[DOI]

Chunyan Tang

Aurélien Bouteiller

Thomas Hérault

George Bosilca

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

An Evaluation of OpenSHMEM Interfaces for the Variable-Length Alltoallv() Collective Operation.

[BibT_eX]

[DOI]

M. Graham Lopez

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Parallelizing the Smith-Waterman Algorithm Using OpenSHMEM and MPI-3 One-Sided Interfaces.

[BibT_eX]

[DOI]

Matthew B. Baker

Aaron Welch

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

UCX: An Open Source Framework for HPC Network APIs and Beyond.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

Fast Fault Injection and Sensitivity Analysis for Collective Communications.

[BibT_eX]

[DOI]

Kun Feng

Dong Li

Xian-He Sun

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

Development and Extension of Atomic Memory Operations in OpenSHMEM.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

OpenSHMEM Reference Implementation using UCCS-uGNI Transport Layer.

[BibT_eX]

[DOI]

Tomislav Janjusic

Stephen W. Poole

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Fault Tolerance for OpenSHMEM.

[BibT_eX]

[DOI]

Pengfei Hao

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Designing a High Performance OpenSHMEM Implementation Using Universal Common Communication Substrate as a Communication Middleware.

[BibT_eX]

[DOI]

Stephen W. Poole

Aaron Welch

Tony Curtis

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

OpenSHMEM Extensions and a Vision for Its Future Direction.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

2013

Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical design and implementation.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems.

[BibT_eX]

[DOI]

Cong Xu

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012

Exploiting Atomic Operations for Barrier on Cray XE/XK Systems.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Exploring the All-to-All Collective Optimization Space with ConnectX CORE-Direct.

[BibT_eX]

[DOI]

Proceedings of the 41st International Conference on Parallel Processing, 2012

Performance Evaluation of Open MPI on Cray XE/XK Systems.

[BibT_eX]

[DOI]

Samuel K. Gutierrez

Nathan T. Hjelm

Proceedings of the IEEE 20th Annual Symposium on High-Performance Interconnects, 2012

Assessing the Performance and Scalability of a Novel Multilevel K-Nomial Allgather on CORE-Direct Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011

ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Analyzing the Effects of Multicore Architectures and On-Host Communication Characteristics on Collective Communications.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Parallel Processing Workshops, 2011

Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Cheetah: A Framework for Scalable Hierarchical Collective Operations.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2009

Using application communication characteristics to drive dynamic MPI reconfiguration.

[BibT_eX]

[DOI]

Patrick G. Bridges

Patrick M. Widener

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2006

MPI/CTP: A Reconfigurable MPI for HPC Applications.

[BibT_eX]

[DOI]