Mary W. Hall

Catherine Olschanowsky

Proc. IEEE, 2018

Autotuning in High-Performance Computing Applications.

[BibT_eX]

[DOI]

Jeffrey K. Hollingsworth

Boyana Norris

Richard W. Vuduc

Proc. IEEE, 2018

Sparse Matrix Code Dependence Analysis Simplification at Compile Time.

[BibT_eX]

[DOI]

Mahdi Soltan Mohammadi

Kazem Cheshmi

Ganesh Gopalakrishnan

CoRR, 2018

SIMD code generation for stencils on brick decompositions.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

2017

Guest Editor Introduction PPoPP 2016, Special Issue 2 of 2.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2017

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers.

[BibT_eX]

[DOI]

Parallel Comput., 2017

Reproducing ParConnect for SC16.

[BibT_eX]

[DOI]

Parallel Comput., 2017

Generation CS: the challenges of and responses to the enrollment surge.

[BibT_eX]

[DOI]

Inroads, 2017

Generation CS: the mixed news on diversity and the enrollment surge.

[BibT_eX]

[DOI]

Inroads, 2017

Generation CS: the growth of computer science.

[BibT_eX]

[DOI]

Inroads, 2017

Polyhedral Compilation Support for C++ Features: A Case Study with CPPTRAJ.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2017

Automating Compiler-Directed Autotuning for Phased Performance Behavior.

[BibT_eX]

[DOI]

Tharindu Rusira

Protonu Basu

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016

Designing a Tunable Nested Data-Parallel Programming System.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

Compiler Transformation to Generate Hybrid Sparse Computations.

[BibT_eX]

[DOI]

Huihui Zhang

Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

Automating wavefront parallelization for sparse matrix computations.

[BibT_eX]

[DOI]

Mahdi Soltan Mohammadi

Jongsoo Park

Hongbo Rong

Rajkishore Barik

Proceedings of the International Conference for High Performance Computing, 2016

Polyhedral Compiler Technology in Collaboration with Autotuning Important to Domain-Specific Frameworks for HPC.

[BibT_eX]

[DOI]

Protonu Basu

Proceedings of the Languages and Compilers for Parallel Computing, 2016

Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action.

[BibT_eX]

[DOI]

Khalid Ahmad

Proceedings of the Languages and Compilers for Parallel Computing, 2016

Synchronization Trade-Offs in GPU Implementations of Graph Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Architecture-Adaptive Code Variant Tuning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015

A collection-oriented programming model for performance portability.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Loop and data transformations for sparse matrix code.

[BibT_eX]

[DOI]

Michelle Strout

Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Compiler-Directed Transformation for Higher-Order Stencils.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Generating Efficient Tensor Contractions for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 44th International Conference on Parallel Processing, 2015

2014

Practices of PLDI.

[BibT_eX]

[DOI]

ACM SIGPLAN Notices, 2014

Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

Nitro: A Framework for Adaptive Code Variant Tuning.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Non-affine Extensions to Polyhedral Code Generation.

[BibT_eX]

[DOI]

Manu Shantharam

Suresh Venkatasubramanian

Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013

A script-based autotuning compiler system to generate high-performance CUDA code.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

Towards making autotuning mainstream.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2013

Rethinking Abstractions for Big Data: Why, Where, How, and What.

[BibT_eX]

[DOI]

Jacobus E. van der Merwe

CoRR, 2013

Compiler generation and autotuning of communication-avoiding operators for geometric multigrid.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

2012

Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters.

[BibT_eX]

[DOI]

J. Supercomput., 2012

Understanding ACM's past.

[BibT_eX]

[DOI]

Commun. ACM, 2012

Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study.

[BibT_eX]

[DOI]

Shreyas Ramalingam

Andre Vincent Pascal Grosset

Chun Chen

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2011

Domain-Specific Optimization of Signal Recognition Targeting FPGAs.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2011

Auto-tuning full applications: A case study.

[BibT_eX]

[DOI]

Ananta Tiwari

Jeffrey K. Hollingsworth

Int. J. High Perform. Comput. Appl., 2011

Evaluating graph coloring on GPUs.

[BibT_eX]

[DOI]

Peihong Zhu

Shusen Liu

Suresh Venkatasubramanian

Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

EigenCFA: accelerating flow analysis with GPUs.

[BibT_eX]

[DOI]

Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011

Analyzing the effects of compiler optimizations on application reliability.

[BibT_eX]

[DOI]

Melina Demertzi

Murali Annavaram

Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures.

[BibT_eX]

[DOI]

Gagandeep S. Sachdev

Kshitij Sudan

Rajeev Balasubramonian

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

Parameterized specification, configuration and execution of data-intensive scientific workflows.

[BibT_eX]

[DOI]

Clust. Comput., 2010

A Programming Language Interface to Describe Transformations and Code Generation.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2010

Speeding up Nek5000 with autotuning and specialization.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Supercomputing, 2010

Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology.

[BibT_eX]

[DOI]

Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009

Evaluating compiler technology for control-flow optimizations for multimedia extension architectures.

[BibT_eX]

[DOI]

Sivaramakrishnan Narayanan

Microprocess. Microsystems, 2009

HPC and Grid Computing for Integrative Biomedical Research.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2009

Compiler research: the next 50 years.

[BibT_eX]

[DOI]

David A. Padua

Keshav Pingali

Commun. ACM, 2009

Loop Transformation Recipes for Code Generation and Auto-Tuning.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2009

A scalable auto-tuning framework for compiler optimization.

[BibT_eX]

[DOI]

Jeffrey K. Hollingsworth

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Model-guided autotuning of high-productivity languages for petascale computing.

[BibT_eX]

[DOI]

Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009

An integrated framework for performance-based optimization of scientific workflows.

[BibT_eX]

[DOI]

Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009

Computation reuse in domain-specific optimization of signal recognition.

[BibT_eX]

[DOI]

Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

2008

Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques.

[BibT_eX]

[DOI]

Yolanda Gil

Robert F. Lucas

Proc. IEEE, 2008

Model-guided performance tuning of parameter values: A case study with molecular dynamics visualization.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Designing and parameterizing a workflow for optimization: A case study in biomedical imaging.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

The potential of computation reuse in high-level optimization of a signal recognition system.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007

Model-Guided Empirical Optimization for Multimedia Extension Architectures: A Case Study.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Intelligent Optimization of Parallel and Distributed Applications.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A Combined Hardware/Software Optimization Framework for Signal Representation and Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computational Science, 2007

2006

A Wiki for discussing and promoting best practices in research.

[BibT_eX]

[DOI]

Commun. ACM, 2006

An overview of the ECO project.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Processing-in-memory technology for knowledge discovery algorithms.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Data Management on New Hardware, 2006

2005

Interprocedural parallelization analysis in SUIF.

[BibT_eX]

[DOI]

ACM Trans. Program. Lang. Syst., 2005

Automatic mapping of C to FPGAs with the DEFACTO compilation and synthesis system.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2005

Empirical Optimization for a Sparse Linear Solver: A Case Study.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2005

A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2005

Evaluating heuristics in automatically mapping multi-loop applications to FPGAs.

[BibT_eX]

[DOI]

Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005

Superword-Level Parallelism in the Presence of Control Flow.

[BibT_eX]

[DOI]

Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy.

[BibT_eX]

[DOI]

Chun Chen

Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

2004

A Code Isolator: Isolating Code Fragments from Large Programs.

[BibT_eX]

[DOI]

Yoon-Ju Lee

Proceedings of the Languages and Compilers for High Performance Computing, 2004

A Case Study Using Empirical Optimization for a Large, Engineering Application.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Custom Data Layout for Memory Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

Increasing the Applicability of Scalar Replacement.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction, 13th International Conference, 2004

2003

Exploiting Superword-Level Locality in Multimedia Extension Architectures.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2003

Search Space Properties for Mapping Coarse-Grain Pipelined FPGA Applications.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2003

ECO: An Empirical-Based Compilation and Optimization System.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Compiler-generated communication for pipelined FPGA applications.

[BibT_eX]

[DOI]

Pedro C. Diniz

Proceedings of the 40th Design Automation Conference, 2003

Using estimates from behavioral synthesis tools in compiler-directed design space exploration.

[BibT_eX]

[DOI]

Pedro C. Diniz

Proceedings of the 40th Design Automation Conference, 2003

2002

A Compiler Approach to Fast Hardware Design Space Exploration in FPGA-based Systems.

[BibT_eX]

[DOI]

Pedro C. Diniz

Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002

The architecture of the DIVA processing-in-memory chip.

[BibT_eX]

[DOI]

Proceedings of the 16th international conference on Supercomputing, 2002

Coarse-Grain Pipelining on Multiple FPGA Architectures.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2002), 2002

Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001

Bridging the Gap between Compilation and Synthesis in the DEFACTO System.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2001

2000

Evaluating Automatic Parallelization in SUIF.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2000

Memory Management in a PIM-Based Architecture.

[BibT_eX]

[DOI]

Craig S. Steele

Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

1999

Combining compile-time and run-time parallelization.

[BibT_eX]

[DOI]

Sci. Program., 1999

Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Evaluation of Predicated Array Data-Flow Analysis for Automatic Parallelization.

[BibT_eX]

[DOI]

Proceedings of the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'99), 1999

DEFACTO: A Design Environment for Adaptive Computing Technology.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Processing, 1999

1998

Adaptive parallelism in compiler-parallelized code.

[BibT_eX]

[DOI]

Margaret Martonosi

Concurr. Pract. Exp., 1998

A Case for Combining Compile-Time and Run-Time Parallelization.

[BibT_eX]

[DOI]

Proceedings of the Languages, 1998

Measuring the Effectiveness of Automatic Parallelization in SUIF.

[BibT_eX]

[DOI]

Proceedings of the 12th international conference on Supercomputing, 1998

Predicated Array Data-flow Analysis for Run-time Parallelization.

[BibT_eX]

[DOI]

Brian R. Murphy

Proceedings of the 12th international conference on Supercomputing, 1998

1996

Characterizing the Memory Behavior of Compiler-Parallelized Applications.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1996

Multiprocessors from a software perspective.

[BibT_eX]

[DOI]

Saman P. Amarasinghe

Jennifer-Ann M. Anderson

Christopher S. Wilson

IEEE Micro, 1996

Interprocedural Compilation on Fortran D.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1996

Memory Referencing Behavior in Compiler-Parallelized Applications.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1996

Maximizing Multiprocessor Performance with the SUIF Compiler.

[BibT_eX]

[DOI]

Jennifer-Ann M. Anderson

Computer, 1996

1995

Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Interprocedural Parallelization Analysis: A Case Study.

[BibT_eX]

Brian R. Murphy

Saman P. Amarasinghe

Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

Interprocedural Analysis for Parallelization.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1995

Evaluating the impact of advanced memory systems on compiler-parallelized codes.

[BibT_eX]

[DOI]

Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

1994

SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers.

[BibT_eX]

[DOI]

Robert P. Wilson

Robert S. French

Christopher S. Wilson

Saman P. Amarasinghe

Jennifer-Ann M. Anderson

ACM SIGPLAN Notices, 1994

1993

The ParaScope parallel programming environment.

[BibT_eX]

[DOI]

John M. Mellor-Crummey

Linda Torczon

Scott K. Warren

Proc. IEEE, 1993

A Methodology for Procedure Cloning.

[BibT_eX]

[DOI]

Comput. Lang., 1993

Experiences Using the ParaScope Editor: an Interactive Parallel Programming Tool.

[BibT_eX]

[DOI]

Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1993

FIAT: A Framework for Interprocedural Analysis and Transfomation.

[BibT_eX]

[DOI]

John M. Mellor-Crummey

Alan Carle

René G. Rodríguez

Proceedings of the Languages and Compilers for Parallel Computing, 1993

1992

Efficient Call Graph Analysis.

[BibT_eX]

[DOI]

LOPLAS, 1992

Unexpected Side Effects of Inline Substitution: A Case Study.

[BibT_eX]

[DOI]

Linda Torczon

LOPLAS, 1992

Interprocedural Compilation of Fortran D for MIMD Distributed-Memory Machines.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '92, 1992

Procedure cloning.

[BibT_eX]

[DOI]

Proceedings of the ICCL'92, 1992

1991

An Experiment with Inline Substitution.

[BibT_eX]

[DOI]

Linda Torczon

Softw. Pract. Exp., 1991

Interprocedural transformations for parallel code generation.

[BibT_eX]

[DOI]