Lawrence Rauchwerger

CoRR, 2023

2021

FFT blitz: the tensor cores strike back.

[BibT_eX]

[DOI]

Sultan Durrani

Muhammad Saad Chughtai

Abdul Dakkak

Wen-Mei Hwu

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Accelerating SARS-CoV-2 low frequency variant calling on ultra deep sequencing datasets.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles.

[BibT_eX]

[DOI]

Sultan Durrani

Muhammad Saad Chughtai

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020

Introduction to the Special Issue on PPoPP 2017 (Part 2).

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2020

Provably optimal parallel transport sweeps on semi-structured grids.

[BibT_eX]

[DOI]

J. Comput. Phys., 2020

2019

Rethinking Incremental and Parallel Pointer Analysis.

[BibT_eX]

[DOI]

Bozhen Liu

Jeff Huang

ACM Trans. Program. Lang. Syst., 2019

Introduction to the Special Issue on PPoPP 2017 (Part 1).

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2019

Two Roads to Parallelism: From Serial Code to Programming with STAPL.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

2018

Nested Parallelism with Algorithmic Skeletons.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2018

2016

Fast Approximate Distance Queries in Unweighted Graphs Using Bounded Asynchrony.

[BibT_eX]

[DOI]

Adam Fidel

Francisco Coral-Sabido

Colton Riedel

Proceedings of the Languages and Compilers for Parallel Computing, 2016

2015

Finding schedule-sensitive branches.

[BibT_eX]

[DOI]

Jeff Huang

Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 2015

A hierarchical approach to reducing communication in parallel graph algorithms.

[BibT_eX]

[DOI]

Harshvardhan

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2015

A Hybrid Approach to Processing Big Data Graphs on Memory-Restricted Systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Composing Algorithmic Skeletons to Express High-Performance Scientific Applications.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

STAPL-RTS: An Application Driven Runtime System.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Scalable conditional induction variables (CIV) analysis.

[BibT_eX]

[DOI]

Cosmin E. Oancea

Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

SCCMulti: an improved parallel strongly connected components algorithm.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

The stapl Skeleton Framework.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2014

Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Author retrospective for adaptive reduction parallelization techniques.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

KLA: a new algorithmic paradigm for parallel graph computations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

Processing big data graphs on memory-restricted systems.

[BibT_eX]

[DOI]

Harshvardhan

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

From petascale to the pocket: Adaptively scaling parallel programs for mobile SoCs.

[BibT_eX]

[DOI]

Adam Fidel

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2012

Logical inference techniques for loop parallelization.

[BibT_eX]

[DOI]

Cosmin E. Oancea

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012

The STAPL Parallel Graph Library.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2012

2011

Speculative Parallelization of Loops.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization

[BibT_eX]

[DOI]

CoRR, 2011

The STAPL parallel container framework.

[BibT_eX]

[DOI]

Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

A Hybrid Approach to Proving Memory Reference Monotonicity.

[BibT_eX]

[DOI]

Cosmin E. Oancea

Proceedings of the Languages and Compilers for Parallel Computing, 2011

2010

STAPL: standard template adaptive parallel library.

[BibT_eX]

[DOI]

Proceedings of of SYSTOR 2010: The 3rd Annual Haifa Experimental Systems Conference, 2010

The STAPL pView.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2010

2009

The STAPL pList.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2009

Two memory allocators that use hints to improve locality.

[BibT_eX]

[DOI]

Alin Jula

Proceedings of the 8th International Symposium on Memory Management, 2009

2008

Implementation of Sensitivity Analysis for Automatic Parallelization.

[BibT_eX]

[DOI]

Maikel Pennings

Proceedings of the Languages and Compilers for Parallel Computing, 2008

Design for Interoperability in stapl: pMatrices and Linear Algebra Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2008

2007

The STAPL pArray.

[BibT_eX]

[DOI]

Proceedings of the 2007 workshop on MEmory performance, 2007

Associative Parallel Containers in STAPL.

[BibT_eX]

[DOI]

Gabriel Tanase

Chidambareswaran Raman

Mauro Bianco

Proceedings of the Languages and Compilers for Parallel Computing, 2007

Sensitivity analysis for automatic parallelization on multi-cores.

[BibT_eX]

[DOI]

Maikel Pennings

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

2006

An Adaptive Algorithm Selection Framework for Reduction Parallelization.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2006

SmartApps: middle-ware for adaptive applications on reconfigurable platforms.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., 2006

Armi: a High Level Communication Library for Stapl.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2006

Custom Memory Allocation for Free.

[BibT_eX]

[DOI]

Alin Jula

Proceedings of the Languages and Compilers for Parallel Computing, 2006

Design and Use of htalib - A Library for Hierarchically Tiled Arrays.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2006

Region array SSA.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005

Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2005

Finding strongly connected components in distributed graphs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2005

An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2005

Parallel protein folding with STAPL.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2005

A framework for adaptive algorithm selection in STAPL.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Scalable Array SSA and Array Data Flow Analysis.

[BibT_eX]

[DOI]

Guobin He

Proceedings of the Languages and Compilers for Parallel Computing, 2005

2004

Automatic Parallelization Using the Value Evolution Graph.

[BibT_eX]

[DOI]

Dongmin Zhang

Proceedings of the Languages and Compilers for High Performance Computing, 2004

An Adaptive Algorithm Selection Framework.

[BibT_eX]

[DOI]

Dongmin Zhang

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

The Value Evolution Graph and its Use in Memory Reference Analysis.

[BibT_eX]

[DOI]

Dongmin Zhang

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003

Hybrid Analysis: Static & Dynamic Memory Reference Analysis.

[BibT_eX]

[DOI]

Jay P. Hoeflinger

Int. J. Parallel Program., 2003

ARMI: an adaptive, platform independent communication library.

[BibT_eX]

[DOI]

Steven Saunders

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

2002

Parallel Reductions: An Application of Adaptive Algorithm Selection.

[BibT_eX]

[DOI]

Francis H. Dang

Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops.

[BibT_eX]

[DOI]

Francis H. Dang

Jacques Chassin de Kergommeaux

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

SmartApps: An Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations.

[BibT_eX]

[DOI]

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

2001

Finding strongly connected components in parallel in particle transport sweeps.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Annual ACM Symposium on Parallel Algorithms and Architectures, 2001

Identifying Strongly Connected Components in Parallel.

[BibT_eX]

Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001

STAPL: An Adaptive, Generic Parallel C++ Library.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2001

Removing architectural bottlenecks to the scalability of speculative parallelization.

[BibT_eX]

[DOI]

Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000

Parallel computing for irregular applications.

[BibT_eX]

[DOI]

Philip J. Hatcher

Parallel Comput., 2000

Speculative Parallelization of Partially Parallel Loops.

[BibT_eX]

[DOI]

Francis H. Dang

Proceedings of the Languages, 2000

SmartApps: An Application Centric Approach to High Performance Computing.

[BibT_eX]

[DOI]

Josep Torrellas

Proceedings of the Languages and Compilers for Parallel Computing, 2000

Adaptive reduction parallelization techniques.

[BibT_eX]

[DOI]

Proceedings of the 14th international conference on Supercomputing, 2000

Techniques for Reducing the Overhead of Run-Time Parallelization.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction, 9th International Conference, 2000

1999

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1999

Parallel Transport Computations by Spatial Decomposition.

[BibT_eX]

Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Run-Time Parallelization Optimization Techniques.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1999

Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications.

[BibT_eX]

[DOI]

Proceedings of the 13th international conference on Supercomputing, 1999

Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors.

[BibT_eX]

[DOI]

Ye Zhang

Josep Torrellas

Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Implementation Issues of Loop-Level Speculative Run-Time Parallelization.

[BibT_eX]

[DOI]

Devang Patel

Proceedings of the Compiler Construction, 8th International Conference, 1999

1998

Run-Time Parallelization: Its Time Has Come.

[BibT_eX]

[DOI]

Parallel Comput., 1998

Standard Templates Adaptive Parallel Library (STAPL).

[BibT_eX]

[DOI]

Francisco Arzu

Koji Ouchi

Proceedings of the Languages, 1998

Principles of Speculative Run-Time Parallelization.

[BibT_eX]

[DOI]

Devang Patel

Proceedings of the Languages and Compilers for Parallel Computing, 1998

Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

Ye Zhang

Josep Torrellas

Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

1996

Parallel Programming with Polaris.

[BibT_eX]

[DOI]

Computer, 1996

Restructuring Programs for High-Speed Computers with Polaris.

[BibT_eX]

[DOI]

Proceedings of the 1996 International Conference on Parallel Processing Workshop, 1996

1995

A scalable method for run-time loop parallelization.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1995

Parallelizing while loops for multiprocessor systems.

[BibT_eX]

[DOI]

Proceedings of IPPS '95, 1995

Run-Time Methods for Parallelizing Partially Parallel Loops.

[BibT_eX]

[DOI]

Proceedings of the 9th international conference on Supercomputing, 1995

1994

Automatic Detection of Parallelism: A grand challenge for high performance computing.

[BibT_eX]

[DOI]

IEEE Parallel Distributed Technol. Syst. Appl., 1994

Polaris: Improving the Effectiveness of Parallelizing Compilers.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1994

The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization.

[BibT_eX]

[DOI]

Proceedings of the 8th international conference on Supercomputing, 1994

1993

Measuring limits of parallelism and characterizing its vulnerability to resource constraints.

[BibT_eX]

[DOI]

Pradeep K. Dubey

Ravi Nair

Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

1990

A multiple floating point coprocessor architecture.

[BibT_eX]

[DOI]