We stand with Ukraine

We stand with Ukraine

James Dinan

Orcid: 0000-0002-4840-7737

According to our database¹, James Dinan authored at least 66 papers between 2006 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL.

[DOI]

,

,

Maayan Sheraizin

,

,

,

Subhadeep Bhattacharya

,

,

,

Georgios Theodorakis

,

,

Peter-Jan Gootzen

,

,

,

Salvatore Di Girolamo

,

,

,

Manjunath Gorentla Venkata

,

CoRR, March, 2026

2025

Demystifying NCCL: An In-Depth Analysis of GPU Communication Protocols and Algorithms.

[DOI]

,

,

,

Sylvain Jeaugey

,

Cedell Alexander

,

,

,

Jeff R. Hammond

,

Torsten Hoefler

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2025

2021

Optimizing Work Stealing Communication with Structured Atomic Operations.

[DOI]

,

,

D. Brian Larkins

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

2020

Simplifying Communication Overlap in OpenSHMEM Through Integrated User-Level Thread Scheduling.

[DOI]

Md. Wasi-ur-Rahman

,

,

Proceedings of the High Performance Computing - 35th International Conference, 2020

2019

Designing, Implementing, and Evaluating the Upcoming OpenSHMEM Teams API.

[DOI]

,

Md. Wasi-ur-Rahman

,

,

Proceedings of the 2019 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI, 2019

Accelerated Work Stealing.

[DOI]

D. Brian Larkins

,

,

Proceedings of the 48th International Conference on Parallel Processing, 2019

2018

Lightweight Instrumentation and Analysis Using OpenSHMEM Performance Counters.

[DOI]

Md. Wasi-ur-Rahman

,

,

Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

Design and Optimization of OpenSHMEM 1.4 for the Intel<sup>®</sup> Omni-Path Fabric 100 Series.

[DOI]

,

Md. Wasi-ur-Rahman

,

,

Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

Efficient Runtime Support for a Partitioned Global Logical Address Space.

[DOI]

D. Brian Larkins

,

,

Proceedings of the 47th International Conference on Parallel Processing, 2018

2017

Application-Level Optimization of On-Node Communication in OpenSHMEM.

[DOI]

Md. Wasi-ur-Rahman

,

,

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Symmetric Memory Partitions in OpenSHMEM: A Case Study with Intel KNL.

[DOI]

Naveen Namashivayam

,

,

Krishna Kandalla

,

,

Joseph Robichaux

,

,

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Implementation and Evaluation of OpenSHMEM Contexts Using OFI Libfabric.

[DOI]

,

,

,

Howard Pritchard

,

,

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches.

[DOI]

,

,

,

Keith D. Underwood

,

Torsten Hoefler

Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017

2016

MPI-ACC: Accelerator-Aware MPI for Scientific Applications.

[DOI]

,

Lokendra S. Panwar

,

,

,

,

,

Keith R. Bisset

,

,

,

John M. Mellor-Crummey

,

,

IEEE Trans. Parallel Distributed Syst., 2016

Global-view coefficients: a data management solution for parallel quantum Monte Carlo applications.

[DOI]

,

,

Sravya Tirukkovalur

,

,

,

,

Lucas K. Wagner

,

Concurr. Comput. Pract. Exp., 2016

An implementation and evaluation of the MPI 3.0 one-sided communication interface.

[DOI]

,

,

Darius Buntinas

,

,

,

Concurr. Comput. Pract. Exp., 2016

Work stealing for GPU-accelerated parallel programs in a global address space framework.

[DOI]

,

,

Sriram Krishnamoorthy

,

,

Concurr. Comput. Pract. Exp., 2016

Mitigating MPI Message Matching Misery.

[DOI]

,

,

Keith D. Underwood

Proceedings of the High Performance Computing - 31st International Conference, 2016

Extending a Message Passing Runtime to Support Partitioned, Global Logical Address Spaces.

[DOI]

D. Brian Larkins

,

Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

Design and Implementation of OpenSHMEM Using OFI on the Aries Interconnect.

[DOI]

,

,

,

Howard Pritchard

,

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

AsHES Introduction and Committees.

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015

Remote Memory Access Programming in MPI-3.

[DOI]

Torsten Hoefler

,

,

,

,

,

,

Keith D. Underwood

ACM Trans. Parallel Comput., 2015

AsHES Introduction and Committees.

[DOI]

,

,

,

,

Satoshi Matsuoka

,

,

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience.

[DOI]

Proceedings of the International Conference on Computational Science, 2015

2014

Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data.

[DOI]

,

,

,

,

Nagiza F. Samatova

,

IEEE Trans. Parallel Distributed Syst., 2014

Enabling communication concurrency through flexible MPI endpoints.

[DOI]

,

,

,

,

,

,

Int. J. High Perform. Comput. Appl., 2014

Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints.

[DOI]

Srinivas Sridharan

,

,

Dhiraj D. Kalamkar

Proceedings of the International Conference for High Performance Computing, 2014

MC-Checker: Detecting Memory Consistency Errors in MPI One-Sided Applications.

[DOI]

,

,

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2014

Multi-Threaded OpenSHMEM: A Bad Idea?

[DOI]

,

Ulf R. Hanebutte

,

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

One-Sided Append: A New Communication Paradigm For PGAS Models.

[DOI]

,

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Contexts: A Mechanism for High Throughput Communication in OpenSHMEM.

[DOI]

,

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Reducing Synchronization Overhead Through Bundled Communication.

[DOI]

,

,

,

,

Keith D. Underwood

,

Robert W. Wisniewski

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

2013

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory.

[DOI]

Torsten Hoefler

,

,

Darius Buntinas

,

,

,

,

,

,

Computing, 2013

Dataflow coordination of data-parallel tasks via MPI 3.0.

[DOI]

Justin M. Wozniak

,

,

Timothy G. Armstrong

,

,

,

,

Proceedings of the 20th European MPI Users's Group Meeting, 2013

Analysis of topology-dependent MPI performance on Gemini networks.

[DOI]

Antonio J. Peña

,

Ralf G. Correa Carvalho

,

,

,

,

Proceedings of the 20th European MPI Users's Group Meeting, 2013

Enabling MPI interoperability through flexible communication endpoints.

[DOI]

,

,

,

,

,

Proceedings of the 20th European MPI Users's Group Meeting, 2013

Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming.

[DOI]

,

,

,

,

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Inspector/executor load balancing algorithms for block-sparse tensor contractions.

[DOI]

,

,

Allen D. Malony

,

Jeff R. Hammond

,

,

Proceedings of the International Conference on Supercomputing, 2013

Enhancing Performance Portability of MPI Applications through Annotation-Based Transformations.

[DOI]

Md. Ziaul Haque

,

,

,

Proceedings of the 42nd International Conference on Parallel Processing, 2013

pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE 33rd International Conference on Distributed Computing Systems, 2013

On the efficacy of GPU-integrated MPI for scientific applications.

[DOI]

,

Lokendra S. Panwar

,

,

,

,

,

Keith R. Bisset

,

,

,

John M. Mellor-Crummey

,

,

Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Toward Asynchronous and MPI-Interoperable Active Messages.

[DOI]

,

Darius Buntinas

,

Judicael A. Zounmevo

,

,

,

,

,

,

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012

Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming.

[DOI]

Torsten Hoefler

,

,

Darius Buntinas

,

,

Brian W. Barrett

,

,

,

,

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Efficient Multithreaded Context ID Allocation in MPI.

[DOI]

,

,

,

,

Proceedings of the Recent Advances in the Message Passing Interface, 2012

On the Usability of the MPI Shared File Pointer Routines.

[DOI]

Mohamad Chaarawi

,

,

Proceedings of the Recent Advances in the Message Passing Interface, 2012

PARDA: A Fast Parallel Reuse Distance Analysis Algorithm.

[DOI]

,

,

,

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Efficient Intranode Communication in GPU-Accelerated Systems.

[DOI]

,

,

,

Darius Buntinas

,

,

,

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication.

[DOI]

,

,

Jeff R. Hammond

,

Sriram Krishnamoorthy

,

Vinod Tipparaju

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Load Balancing of Dynamical Nucleation Theory Monte Carlo Simulations through Resource Sharing Barriers.

[DOI]

,

,

,

Sriram Krishnamoorthy

,

Theresa L. Windus

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

DMA-Assisted, Intranode Communication in GPU Accelerated Systems.

[DOI]

,

,

,

Darius Buntinas

,

,

,

,

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems.

[DOI]

,

,

Darius Buntinas

,

,

,

Keith R. Bisset

,

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

A global address space approach to automated data management for parallel Quantum Monte Carlo applications.

[DOI]

,

,

Sravya Tirukkovalur

,

,

Lucas K. Wagner

,

Proceedings of the 19th International Conference on High Performance Computing, 2012

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments.

[DOI]

,

,

,

Nagiza F. Samatova

,

Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Transparent Accelerator Migration in a Virtualized GPU Environment.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011

Poster: High-level, one-sided programming models on MPI: a case study with global arrays and NWChem.

[DOI]

,

,

Jeff R. Hammond

,

Sriram Krishnamoorthy

,

Vinod Tipparaju

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Noncollective Communicator Creation in MPI.

[DOI]

,

Sriram Krishnamoorthy

,

,

Jeff R. Hammond

,

Manojkumar Krishnan

,

Vinod Tipparaju

,

Proceedings of the Recent Advances in the Message Passing Interface, 2011

2010

Parichute: Generalized Turbocode-Based Error Correction for Near-Threshold Caches.

[DOI]

Timothy N. Miller

,

,

,

Bruce M. Adcock

,

Radu Teodorescu

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Hybrid parallel programming with MPI and unified parallel C.

[DOI]

,

,

,

,

Proceedings of the 7th Conference on Computing Frontiers, 2010

Selective Recovery from Failures in a Task Parallel Programming Model.

[DOI]

,

,

,

Sriram Krishnamoorthy

Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009

Scalable work stealing.

[DOI]

,

D. Brian Larkins

,

,

Sriram Krishnamoorthy

,

Jarek Nieplocha

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

2008

A message passing benchmark for unbalanced applications.

[DOI]

,

Stephen Olivier

,

,

,

,

Simul. Model. Pract. Theory, 2008

Global trees: a framework for linked data structures on distributed memory parallel systems.

[DOI]

D. Brian Larkins

,

,

Sriram Krishnamoorthy

,

Srinivasan Parthasarathy

,

,

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Scioto: A Framework for Global-View Task Parallelism.

[DOI]

,

Sriram Krishnamoorthy

,

D. Brian Larkins

,

Jarek Nieplocha

,

Proceedings of the 2008 International Conference on Parallel Processing, 2008

2007

Dynamic Load Balancing of Unbalanced Computations Using Message Passing.

[DOI]

,

Stephen Olivier

,

,

,

,

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

2006

UTS: An Unbalanced Tree Search Benchmark.

[DOI]

Stephen Olivier

,

,

,

,

,

,

Proceedings of the Languages and Compilers for Parallel Computing, 2006

Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths.

[DOI]

Uday Bondhugula

,

Ananth Devulapalli

,

,

Joseph Fernando

,

,

,

Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006

Loading...