John M. Mellor-Crummey

Parallel Comput., 2021

Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2021

Parallel binary code analysis.

[BibT_eX]

[DOI]

Jonathon M. Anderson

Mark W. Krentel

Barton P. Miller

Srdan Milakovic

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Using the Semi-Stencil Algorithm to Accelerate High-Order Stencils on GPUs.

[BibT_eX]

[DOI]

Ryuichi Sai

Jie Meng

Proceedings of the 2021 International Workshop on Performance Modeling, 2021

GPA: A GPU Performance Advisor Based on Instruction Sampling.

[BibT_eX]

[DOI]

Ryuichi Sai

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

Parallelizing Binary Code Analysis.

[BibT_eX]

[DOI]

Jonathon M. Anderson

Mark W. Krentel

Barton P. Miller

Srdan Milakovic

CoRR, 2020

GVProf: a value profiler for GPU-based clusters.

[BibT_eX]

[DOI]

Yueming Hao

Proceedings of the International Conference for High Performance Computing, 2020

A tool for top-down performance analysis of GPU-accelerated applications.

[BibT_eX]

[DOI]

Mark Krentel

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Using sample-based time series data for automated diagnosis of scalability losses in parallel programs.

[BibT_eX]

[DOI]

Lai Wei

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Accelerating High-Order Stencils on GPUs.

[BibT_eX]

[DOI]

Ryuichi Sai

Jie Meng

Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

Tools for top-down performance analysis of GPU-accelerated applications.

[BibT_eX]

[DOI]

Mark W. Krentel

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

2019

Understanding congestion in high performance interconnection networks using sampling.

[BibT_eX]

[DOI]

Philip Taffet

Proceedings of the International Conference for High Performance Computing, 2019

Lightweight, Packet-Centric Monitoring of Network Traffic and Congestion Implemented in P4.

[BibT_eX]

[DOI]

Philip Taffet

Proceedings of the 2019 IEEE Symposium on High-Performance Interconnects, 2019

A Tool for Performance Analysis of GPU-Accelerated Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018

Dynamic data race detection for OpenMP programs.

[BibT_eX]

[DOI]

Yizi Gu

Proceedings of the International Conference for High Performance Computing, 2018

Automated Analysis of Time Series Data to Understand Parallel Program Behaviors.

[BibT_eX]

[DOI]

Lai Wei

Proceedings of the 32nd International Conference on Supercomputing, 2018

2016

MPI-ACC: Accelerator-Aware MPI for Scientific Applications.

[BibT_eX]

[DOI]

Xiaosong Ma

Rajeev Thakur

IEEE Trans. Parallel Distributed Syst., 2016

Performance Analysis and Optimization of a Hybrid Distributed Reverse Time Migration Application.

[BibT_eX]

[DOI]

Sri Raj Paul

Detlef Hohl

CoRR, 2016

A Practical Solution to the Cactus Stack Problem.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016

A wait-free queue as fast as fetch-and-add.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Contention-conscious, locality-preserving locks.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Performance Analysis and Optimization of a Hybrid Seismic Imaging Application.

[BibT_eX]

[DOI]

Sri Raj Paul

Detlef Hohl

Proceedings of the International Conference on Computational Science 2016, 2016

Design and Verification of Distributed Phasers.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015

Distributed Phasers.

[BibT_eX]

[DOI]

Sri Raj Paul

Kuldeep S. Meel

CoRR, 2015

Barrier elision for production parallel programs.

[BibT_eX]

[DOI]

Costin Iancu

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

High performance locks for multi-level NUMA systems.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Communication Avoiding Algorithms: Analysis and Code Generation for Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Portable, MPI-interoperable coarray fortran.

[BibT_eX]

[DOI]

Wesley Bland

Pavan Balaji

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

A tool to analyze the performance of multithreaded programs on NUMA architectures.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Test-driven repair of data races in structured parallel programs.

[BibT_eX]

[DOI]

Rishi Surendran

Raghavan Raman

Swarat Chaudhuri

Vivek Sarkar

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Autotuning Tensor Transposition.

[BibT_eX]

[DOI]

Lai Wei

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Author retrospective: compilation techniques for block-cyclic distributions.

[BibT_eX]

[DOI]

Seema Hiranandani

Ajay Sethi

Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Call Paths for Pin Tools.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

ArrayTool: a lightweight profiler to guide array regrouping.

[BibT_eX]

[DOI]

Kamal Sharma

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

A data-centric profiler for parallel programs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

Effective sampling-driven performance tools for GPU-accelerated supercomputers.

[BibT_eX]

[DOI]

Alexandre E. Eichenberger

Proceedings of the International Conference for High Performance Computing, 2013

OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis.

[BibT_eX]

[DOI]

Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

Pinpointing data locality bottlenecks with low overhead.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Managing Asynchronous Operations in Coarray Fortran 2.0.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

A new approach for performance analysis of openMP programs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

On the efficacy of GPU-integrated MPI for scientific applications.

[BibT_eX]

[DOI]

Xiaosong Ma

Rajeev Thakur

Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

2012

DeadSpy: a tool to pinpoint program inefficiencies.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

2011

Using Sampling to Understand Parallel Program Performance.

[BibT_eX]

[DOI]

Proceedings of the Tools for High Performance Computing 2011, 2011

HIPS Keynote.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Implementation and Performance Evaluation of the HPC Challenge Benchmarks in Coarray Fortran 2.0.

[BibT_eX]

[DOI]

William N. Scherer III

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Scalable fine-grained call path tracing.

[BibT_eX]

[DOI]

Michael Franco

Reed Landrum

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Pinpointing data locality problems using data-centric analysis.

[BibT_eX]

[DOI]

Proceedings of the CGO 2011, 2011

2010

Teaching parallel programming: a roundtable discussion.

[BibT_eX]

[DOI]

William Gropp

Maurice Herlihy

XRDS, 2010

HPCTOOLKIT: tools for performance analysis of optimized parallel programs.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2010

Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Analyzing lock contention in multithreaded applications.

[BibT_eX]

[DOI]

Allan Porterfield

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Hiding latency in Coarray Fortran 2.0.

[BibT_eX]

[DOI]

William N. Scherer III

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, 2010

Effectively Presenting Call Path Profiles of Application Performance.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Parallel Processing, 2010

2009

Identifying Performance Bottlenecks in Work-Stealing Computations.

[BibT_eX]

[DOI]

Computer, 2009

Diagnosing performance bottlenecks in emerging petascale applications.

[BibT_eX]

[DOI]

Mark Krentel

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Effective performance measurement and analysis of multithreaded applications.

[BibT_eX]

[DOI]

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Binary analysis for measurement and attribution of program performance.

[BibT_eX]

[DOI]

Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009

2008

Where will all the threads come from?

[BibT_eX]

[DOI]

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Pinpointing and Exploiting Opportunities for Enhancing Data Reuse.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

2007

Application Insight Through Performance Modeling.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Performance Computing and Communications Conference, 2007

Scalability analysis of SPMD codes using expectations.

[BibT_eX]

[DOI]

Nathan Froyd

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

2006

Automatic tuning of whole applications using direct search and a performance-based transformation system.

[BibT_eX]

[DOI]

Apan Qasem

J. Supercomput., 2006

Experiences with Sweep3D implementations in Co-array Fortran.

[BibT_eX]

[DOI]

J. Supercomput., 2006

2005

SFCGen: A framework for efficient generation of multi-dimensional space-filling curves by recursion.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2005

Telescoping Languages: A System for Automatic Generation of Domain Languages.

[BibT_eX]

[DOI]

Proc. IEEE, 2005

Improving Performance by Reducing the Memory Footprint of Scientific Applications.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2005

An evaluation of global address space languages: co-array fortran and unified parallel C.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Effective communication coalescing for data-parallel applications.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Representation-independent program analysis.

[BibT_eX]

[DOI]

Michelle Mills Strout

Paul D. Hovland

Proceedings of the 2005 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, 2005

COTS Clusters vs. the Earth Simulator: An Application Study Using IMPACT-3D.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Low-overhead call path profiling of unmodified, optimized code.

[BibT_eX]

[DOI]

Nathan Froyd

Proceedings of the 19th Annual International Conference on Supercomputing, 2005

PRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction.

[BibT_eX]

[DOI]

Luay Nakhleh

Usman Roshan

Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

Scheduling strategies for mapping application workflows onto the grid.

[BibT_eX]

[DOI]

Bo Liu

S. Lennart Johnsson

Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing, 2005

Reconstructing Phylogenetic Networks Using Maximum Parsimony.

[BibT_eX]

[DOI]

Luay Nakhleh

Fengmei Zhao

Proceedings of the Fourth International IEEE Computer Society Computational Systems Bioinformatics Conference, 2005

Space-filling Curve Generation: A Table-based Approach.

[BibT_eX]

Proceedings of the 2005 International Conference on Algorithmic Mathematics and Computer Science, 2005

2004

Optimizing Sparse Matrix - Vector Product Computations Using Unroll and Jam.

[BibT_eX]

[DOI]

John Garvin

Int. J. High Perform. Comput. Appl., 2004

Cross-architecture performance predictions for scientific applications using parameterized models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2004

Experiences with Co-array Fortran on Hardware Shared Memory Platforms.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for High Performance Computing, 2004

New Grid Scheduling and Rescheduling Methods in the GrADS Project.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Scheduling workflow applications in GrADS.

[BibT_eX]

[DOI]

Bo Liu

S. Lennart Johnsson

Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

A Multi-Platform Co-Array Fortran Compiler.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003

Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations.

[BibT_eX]

[DOI]

Alain Darte

J. Parallel Distributed Comput., 2003

Co-array Fortran Performance and Potential: An NPB Experimental Study.

[BibT_eX]

[DOI]

Jason Eckhardt

Proceedings of the Languages and Compilers for Parallel Computing, 2003

2002

HPCVIEW: A Tool for Top-down Analysis of Node Performance.

[BibT_eX]

[DOI]

J. Supercomput., 2002

Advanced optimization strategies in the Rice dHPF compiler.

[BibT_eX]

[DOI]

Bradley Broom

Concurr. Comput. Pract. Exp., 2002

Toward a Framework for Preparing and Executing Adaptive Grid Programs.

[BibT_eX]

[DOI]

Mark Mazina

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Generalized Multipartitioning for Multi-Dimensional Arrays.

[BibT_eX]

[DOI]

Alain Darte

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Experiences tuning SMG98: a semicoarsening multigrid benchmark based on the hypre library.

[BibT_eX]

[DOI]

Proceedings of the 16th international conference on Supercomputing, 2002

An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001

Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries.

[BibT_eX]

[DOI]

Linda Torczon

J. Parallel Distributed Comput., 2001

Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2001

The GrADS Project: Software Support for High-Level Grid Application Development.

[BibT_eX]

[DOI]

Daniel A. Reed

Linda Torczon

Richard Wolski

Int. J. High Perform. Comput. Appl., 2001

On providing useful information for analyzing and tuning applications.

[BibT_eX]

[DOI]

Proceedings of the Joint International Conference on Measurements and Modeling of Computer Systems, 2001

Increasing temporal locality with skewing and recursive blocking.

[BibT_eX]

[DOI]

Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

Tools for application-oriented performance tuning.

[BibT_eX]

[DOI]

Proceedings of the 15th international conference on Supercomputing, 2001

Data-Parallel Compiler Support for Multipartitioning.

[BibT_eX]

[DOI]

Trushar Sarang

Proceedings of the Euro-Par 2001: Parallel Processing, 2001

Advanced Code Generation for High Performance Fortran.

[BibT_eX]

[DOI]

Proceedings of the Compiler Optimizations for Scalable Parallel Systems Languages, 2001

2000

Compilation and Runtime-Optimizations for Software Distributed Shared Memory.

[BibT_eX]

[DOI]

Kai Zhang

Proceedings of the Languages, 2000

Toward Compiler Support for Scalable Parallelism Using Multipartitioning.

[BibT_eX]

[DOI]

Proceedings of the Languages, 2000

1999

An Evaluation of Computing Paradigms for N-Body Simulations on Distributed Memory Architectures.

[BibT_eX]

[DOI]

Collin McCurdy

Proceedings of the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'99), 1999

Improving memory hierarchy performance for irregular applications.

[BibT_eX]

[DOI]

Proceedings of the 13th international conference on Supercomputing, 1999

1998

High Performance Fortran Compilation Techniques for Parallelizing Scientific Codes.

[BibT_eX]

[DOI]

Qing Yi

Proceedings of the ACM/IEEE Conference on Supercomputing, 1998

Using Integer Sets for Data-Parallel Program Analysis and Optimization.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), 1998

Compiler-Optimization of Implicit Reductions for Distributed Memory Multiprocessors.

[BibT_eX]

[DOI]

Bo Lu

Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

1997

Compiling Stencils in High Performance Fortran.

[BibT_eX]

[DOI]

Gerald Roth

R. Gregg Brickner

Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

Simplifying Control Flow in Compiler-Generated Parallel Code.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1997

1995

An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Optimizing Fortran 90 Shift Operations on Distributed-Memory Multicomputers.

[BibT_eX]

[DOI]

Gerald Roth

Proceedings of the Languages and Compilers for Parallel Computing, 1995

1994

Fast, contention-free combining tree barriers for shared-memory multiprocessors.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1994

Requirements for DataParallel Programming Environments.

[BibT_eX]

[DOI]

Scott K. Warren

Chau-Wen Tseng

IEEE Parallel Distributed Technol. Syst. Appl., 1994

Compilation techniques for block-cyclic distributions.

[BibT_eX]

[DOI]

Seema Hiranandani

Ajay Sethi

Proceedings of the 8th international conference on Supercomputing, 1994

Automatic Data Layout for Distributed-Memory Machines in the D Programming Environment.

[BibT_eX]

[DOI]

Ulrich Kremer

Alan Carle

Proceedings of the Automatic Parallelization: New Approaches to Code Generation, 1994

1993

The ParaScope parallel programming environment.

[BibT_eX]

[DOI]

Linda Torczon

Scott K. Warren

Proc. IEEE, 1993

Compile-Time Support for Efficient Data Race Detection in Shared-Memory Parallel Programs.

[BibT_eX]

[DOI]

Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, 1993

FIAT: A Framework for Interprocedural Analysis and Transfomation.

[BibT_eX]

[DOI]

Mary W. Hall

Alan Carle

René G. Rodríguez

Proceedings of the Languages and Compilers for Parallel Computing, 1993

1992

Automatic software cache coherence through vectorization.

[BibT_eX]

[DOI]

Ervan Darnell

Proceedings of the 6th international conference on Supercomputing, 1992

1991

Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 1991

On-the-fly detection of data races for programs with nested fork-join parallelism.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '91, 1991

Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1991

Synchronization without Contention.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-IV Proceedings, 1991

1990

Analyzing Parallel Program Executions Using Multiple Views.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1990

Parallel program debugging with on-the-fly anomaly detection.

[BibT_eX]

[DOI]

Robert Hood

Proceedings of the Proceedings Supercomputing '90, New York, NY, USA, November 12-16, 1990, 1990

1989

The Elmwood Multiprocessor Operating System.

[BibT_eX]

[DOI]

Neal M. Gafter

Lawrence A. Crowl

Peter C. Dibble

Softw. Pract. Exp., 1989

A Software Instruction Counter.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-III Proceedings, 1989

1988

An Integrated Approach to Parallel Program Debugging and Performance Analysis of Large-Scal Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging, 1988

Experience with the BBN Butterfly.

[BibT_eX]

[DOI]

Proceedings of the COMPCON'88, Digest of Papers, Thirty-Third IEEE Computer Society International Conference, San Francisco, California, USA, February 29, 1988

1987

Debugging Parallel Programs with Instant Replay.

[BibT_eX]

[DOI]