Jeff R. Hammond

Orcid: 0000-0003-3181-8190

Affiliations:
  • Intel Labs


According to our database1, Jeff R. Hammond authored at least 50 papers between 2011 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
shmem4py: OpenSHMEM for Python.
J. Open Source Softw., 2023

shmem4py: High-Performance One-Sided Communication for Python Applications.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

MPI Application Binary Interface Standardization.
Proceedings of the 30th European MPI Users' Group Meeting, 2023

Optimizing Cloud Computing Resource Usage for Hemodynamic Simulation.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023


2022
Early Application Experiences on a Modern GPU-Accelerated Arm-based HPC Platform.
CoRR, 2022

Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

2021
Enabling ISO Standard Languages for Complex HPC Workflows.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, 2021

OpenSHMEM over MPI as a Performance Contender: Thorough Analysis and Optimizations.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart Networks, 2021

2020
Data Parallel C++: Enhancing SYCL Through Extensions for Productivity and Performance.
Proceedings of the IWOCL '20: International Workshop on OpenCL, 2020

2019
Evaluating data parallelism in C++ using the Parallel Research Kernels.
Proceedings of the International Workshop on OpenCL, 2019

A comparative analysis of Kokkos and SYCL as heterogeneous, parallel programming models for C++ applications.
Proceedings of the International Workshop on OpenCL, 2019

Software combining to mitigate multithreaded MPI contention.
Proceedings of the ACM International Conference on Supercomputing, 2019

2018
Dynamic Adaptable Asynchronous Progress Model for MPI RMA Multiphase Applications.
IEEE Trans. Parallel Distributed Syst., 2018

Lock Contention Management in Multithreaded MPI.
ACM Trans. Parallel Comput., 2018

Visualization of OpenMP* Task Dependencies Using Intel® Advisor - Flow Graph Analyzer.
Proceedings of the Evolving OpenMP for Evolving Architectures, 2018

2017
TTC: A High-Performance Compiler for Tensor Transpositions.
ACM Trans. Math. Softw., 2017

Exploring versioned distributed arrays for resilience in scientific applications.
Int. J. High Perform. Comput. Appl., 2017

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel® Xeon Phi™ Processor.
Proceedings of the High Performance Computing, 2017

2016
MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation.
SIAM J. Sci. Comput., 2016

Scaling up Hartree-Fock calculations on Tianhe-2.
Int. J. High Perform. Comput. Appl., 2016

Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels.
Proceedings of the High Performance Computing - 31st International Conference, 2016

CAF Events Implementation Using MPI-3 Capabilities.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

A Proposal to OpenMP for Addressing the CPU Oversubscription Challenge.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

A Hartree-Fock Application Using UPC++ and the New DArray Library.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

One-Sided Interface for Matrix Operations Using MPI-3 RMA: A Case Study with Elemental.
Proceedings of the 45th International Conference on Parallel Processing, 2016

2015
MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation.
CoRR, 2015

Improving concurrency and asynchrony in multithreaded MPI applications using software offloading.
Proceedings of the International Conference for High Performance Computing, 2015

Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience.
Proceedings of the International Conference on Computational Science, 2015

Scaling NWChem with Efficient and Portable Asynchronous Communication in MPI RMA.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
A massively parallel tensor contraction framework for coupled-cluster computations.
J. Parallel Distributed Comput., 2014

To INT_MAX... and beyond!: exploring large-count support in MPI.
Proceedings of the 2014 Workshop on Exascale MPI, 2014

Towards a matrix-oriented strided interface in OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Implementing OpenSHMEM Using MPI-3 One-Sided Communication.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

Anatomy of High-Performance Many-Threaded Matrix Multiplication.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

WorkQ: A many-core producer/consumer execution model applied to PGAS computations.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2013
Elemental: A New Framework for Distributed Memory Dense Matrix Computations.
ACM Trans. Math. Softw., 2013

Challenges and methods in large-scale computational chemistry applications.
XRDS, 2013

Performance Analysis of the NWChem TCE for Different Communication Patterns.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Performance Analysis of the Lattice Boltzmann Model Beyond Navier-Stokes.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Inspector/executor load balancing algorithms for block-sparse tensor contractions.
Proceedings of the International Conference on Supercomputing, 2013

2012
Performance characterization of global address space applications: a case study with NWChem.
Concurr. Comput. Pract. Exp., 2012

Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

ALCF MPI Benchmarks: Understanding Machine-Specific Communication Behavior.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

An evaluation of difference and threshold techniques for efficient checkpoints.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2012

2011
Poster: Passing the three trillion particle limit with an error-controlled fast multipole method.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Poster: High-level, one-sided programming models on MPI: a case study with global arrays and NWChem.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Noncollective Communicator Creation in MPI.
Proceedings of the Recent Advances in the Message Passing Interface, 2011


  Loading...