Richard L. Graham

According to our database1, Richard L. Graham authored at least 61 papers between 2003 and 2022.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2022
NVIDIA's Quantum InfiniBand Network Congestion Control Technology and Its Impact on Application Performance.
Proceedings of the High Performance Computing - 37th International Conference, 2022

2021
NVIDIA's Cloud Native Supercomputing.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, 2021

2020
The high-speed networks of the Summit and Sierra supercomputers.
IBM J. Res. Dev., 2020

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)<sup>TM</sup> Streaming-Aggregation Hardware Design and Evaluation.
Proceedings of the High Performance Computing - 35th International Conference, 2020

2019
Accelerating OpenSHMEM Collectives Using In-Network Computing Approach.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

2018
LRUM: Local Reliability Protocol for Unreliable Hardware Multicast.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018

2017
Towards A Data Centric System Architecture: SHARP.
Supercomput. Front. Innov., 2017

2016
Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

Using InfiniBand Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

2015

2014
Development and Extension of Atomic Memory Operations in OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

2013
The co-design architecture for exascale systems, a novel approach for scalable designs.
Comput. Sci. Res. Dev., 2013

KLONOS: Similarity-based planning tool support for porting scientific applications.
Concurr. Comput. Pract. Exp., 2013

Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical design and implementation.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
Analyzing fault aware collective performance in a process fault tolerant MPI.
Parallel Comput., 2012

Exploiting Atomic Operations for Barrier on Cray XE/XK Systems.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

HERCULES: A Pattern Driven Code Transformation System.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Exploring the All-to-All Collective Optimization Space with ConnectX CORE-Direct.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Performance Evaluation of Open MPI on Cray XE/XK Systems.
Proceedings of the IEEE 20th Annual Symposium on High-Performance Interconnects, 2012

Assessing the Performance and Scalability of a Novel Multilevel K-Nomial Allgather on CORE-Direct Systems.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Run-Through Stabilization: An MPI Proposal for Process Fault Tolerance.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

OMPIO: A Modular Software Architecture for MPI I/O.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Building a Fault Tolerant MPI Application: A Ring Communication Example.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Preserving Collective Performance across Process Failure for a Fault Tolerant MPI.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Analyzing the Effects of Multicore Architectures and On-Host Communication Characteristics on Collective Communications.
Proceedings of the 2011 International Conference on Parallel Processing Workshops, 2011

Experiences with High-Level Programming Directives for Porting Applications to GPUs.
Proceedings of the Facing the Multicore - Challenge II, 2011

Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Cheetah: A Framework for Scalable Hierarchical Collective Operations.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010
Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems.
Comput. Sci. Res. Dev., 2010

Network Offloaded Hierarchical Collectives Using ConnectX-2's CORE-<i>Direct</i> Capabilities.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Characteristics of the Unexpected Message Queue of MPI Applications.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009
Dynamic Communicators in MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

The MPI 2.2 Standard and the Emerging MPI 3 Standard.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

2008
MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

A Scalable Tools Communications Infrastructure.
Proceedings of the 22nd Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2008), 2008

2007
Open MPI: a High Performance, Flexible Implementation of MPI Point-to-Point Communications.
Parallel Process. Lett., 2007

A Case for Standard Non-blocking Collective Operations.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

An Evaluation of Open MPI's Matching Transport Layer on the Cray XT.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

Network Fault Tolerance in Open MPI.
Proceedings of the Euro-Par 2007, 2007

2006
High Performance RDMA Protocols in HPC.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Approaches for Parallel Applications Fault Tolerance.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Infiniband scalability in Open MPI.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Aspects of heterogeneous computing in the open MPI environment.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Co-Array Collectives: Refined Semantics for Co-Array Fortran.
Proceedings of the Computational Science, 2006

Open MPI: A High-Performance, Heterogeneous MPI.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

2005
High Performance Broadcast Support in La-Mpi Over Quadrics.
Int. J. High Perform. Comput. Appl., 2005

Analysis of the Component Architecture Overhead in Open MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Open MPI: A Flexible High Performance MPI.
Proceedings of the Parallel Processing and Applied Mathematics, 2005

Design and Implementation of Open MPI over Quadrics/Elan4.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

2004
TEG: A High-Performance, Scalable, Multi-network Point-to-Point Communications Methodology.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Open MPI's TEG Point-to-Point Communications Methodology: Comparison to Existing Implementations.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Architecture of LA-MPI, A Network-Fault-Tolerant MPI.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

2003
A Network-Failure-Tolerant Message-Passing System for Terascale Clusters.
Int. J. Parallel Program., 2003

Network Fault Tolerance in LA-MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29, 2003


  Loading...