Mary W. Hall

According to our database1, Mary W. Hall authored at least 113 papers between 1990 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2019
SWIRL: High-performance many-core CPU code generation for deep neural networks.
IJHPCA, 2019

Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs.
Proceedings of the International Conference for High Performance Computing, 2019

Sparse computation data dependence simplification for efficient compiler-generated inspectors.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

A Framework for Enabling OpenMP Autotuning.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

Rigel: A Framework for OpenMP PerformanceTuning.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

2018
The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code.
Proceedings of the IEEE, 2018

Autotuning in High-Performance Computing Applications.
Proceedings of the IEEE, 2018

Sparse Matrix Code Dependence Analysis Simplification at Compile Time.
CoRR, 2018

SIMD code generation for stencils on brick decompositions.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

2017
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers.
Parallel Computing, 2017

Reproducing ParConnect for SC16.
Parallel Computing, 2017

Polyhedral Compilation Support for C++ Features: A Case Study with CPPTRAJ.
Proceedings of the Languages and Compilers for Parallel Computing, 2017

Automating Compiler-Directed Autotuning for Phased Performance Behavior.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016
Designing a Tunable Nested Data-Parallel Programming System.
TACO, 2016

Compiler Transformation to Generate Hybrid Sparse Computations.
Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

Automating wavefront parallelization for sparse matrix computations.
Proceedings of the International Conference for High Performance Computing, 2016

Polyhedral Compiler Technology in Collaboration with Autotuning Important to Domain-Specific Frameworks for HPC.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

Synchronization Trade-Offs in GPU Implementations of Graph Algorithms.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Architecture-Adaptive Code Variant Tuning.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015
A collection-oriented programming model for performance portability.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Loop and data transformations for sparse matrix code.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Compiler-Directed Transformation for Higher-Order Stencils.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Generating Efficient Tensor Contractions for GPUs.
Proceedings of the 44th International Conference on Parallel Processing, 2015

2014
Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

Nitro: A Framework for Adaptive Code Variant Tuning.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Non-affine Extensions to Polyhedral Code Generation.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013
A script-based autotuning compiler system to generate high-performance CUDA code.
TACO, 2013

Towards making autotuning mainstream.
IJHPCA, 2013

Rethinking Abstractions for Big Data: Why, Where, How, and What.
CoRR, 2013

Compiler generation and autotuning of communication-avoiding operators for geometric multigrid.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

2012
Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters.
The Journal of Supercomputing, 2012

Understanding ACM's past.
Commun. ACM, 2012

Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2011
Domain-Specific Optimization of Signal Recognition Targeting FPGAs.
TRETS, 2011

Auto-tuning full applications: A case study.
IJHPCA, 2011

Evaluating graph coloring on GPUs.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

EigenCFA: accelerating flow analysis with GPUs.
Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011

Analyzing the effects of compiler optimizations on application reliability.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Parameterized specification, configuration and execution of data-intensive scientific workflows.
Cluster Computing, 2010

A Programming Language Interface to Describe Transformations and Code Generation.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Speeding up Nek5000 with autotuning and specialization.
Proceedings of the 24th International Conference on Supercomputing, 2010

Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology.
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009
Evaluating compiler technology for control-flow optimizations for multimedia extension architectures.
Microprocessors and Microsystems - Embedded Hardware Design, 2009

HPC and Grid Computing for Integrative Biomedical Research.
IJHPCA, 2009

Compiler research: the next 50 years.
Commun. ACM, 2009

Loop Transformation Recipes for Code Generation and Auto-Tuning.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

A scalable auto-tuning framework for compiler optimization.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Model-guided autotuning of high-productivity languages for petascale computing.
Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009

An integrated framework for performance-based optimization of scientific workflows.
Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009

Computation reuse in domain-specific optimization of signal recognition.
Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

2008
Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques.
Proceedings of the IEEE, 2008

Model-guided performance tuning of parameter values: A case study with molecular dynamics visualization.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Designing and parameterizing a workflow for optimization: A case study in biomedical imaging.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

The potential of computation reuse in high-level optimization of a signal recognition system.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007
Model-Guided Empirical Optimization for Multimedia Extension Architectures: A Case Study.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Intelligent Optimization of Parallel and Distributed Applications.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A Combined Hardware/Software Optimization Framework for Signal Representation and Recognition.
Proceedings of the Computational Science, 2007

2006
A Wiki for discussing and promoting best practices in research.
Commun. ACM, 2006

An overview of the ECO project.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Processing-in-memory technology for knowledge discovery algorithms.
Proceedings of the Workshop on Data Management on New Hardware, 2006

2005
Interprocedural parallelization analysis in SUIF.
ACM Trans. Program. Lang. Syst., 2005

Automatic mapping of C to FPGAs with the DEFACTO compilation and synthesis system.
Microprocessors and Microsystems, 2005

Empirical Optimization for a Sparse Linear Solver: A Case Study.
International Journal of Parallel Programming, 2005

A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Evaluating heuristics in automatically mapping multi-loop applications to FPGAs.
Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005

Superword-Level Parallelism in the Presence of Control Flow.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

2004
A Code Isolator: Isolating Code Fragments from Large Programs.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

A Case Study Using Empirical Optimization for a Large, Engineering Application.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Custom Data Layout for Memory Parallelism.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

Increasing the Applicability of Scalar Replacement.
Proceedings of the Compiler Construction, 13th International Conference, 2004

2003
Exploiting Superword-Level Locality in Multimedia Extension Architectures.
J. Instruction-Level Parallelism, 2003

Search Space Properties for Mapping Coarse-Grain Pipelined FPGA Applications.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

ECO: An Empirical-Based Compilation and Optimization System.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Compiler-generated communication for pipelined FPGA applications.
Proceedings of the 40th Design Automation Conference, 2003

Using estimates from behavioral synthesis tools in compiler-directed design space exploration.
Proceedings of the 40th Design Automation Conference, 2003

2002
A Compiler Approach to Fast Hardware Design Space Exploration in FPGA-based Systems.
Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002

The architecture of the DIVA processing-in-memory chip.
Proceedings of the 16th international conference on Supercomputing, 2002

Coarse-Grain Pipelining on Multiple FPGA Architectures.
Proceedings of the 10th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2002), 2002

Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001
Bridging the Gap between Compilation and Synthesis in the DEFACTO System.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

2000
Evaluating Automatic Parallelization in SUIF.
IEEE Trans. Parallel Distrib. Syst., 2000

Memory Management in a PIM-Based Architecture.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

1999
Combining compile-time and run-time parallelization.
Scientific Programming, 1999

Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Evaluation of Predicated Array Data-Flow Analysis for Automatic Parallelization.
Proceedings of the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'99), 1999

DEFACTO: A Design Environment for Adaptive Computing Technology.
Proceedings of the Parallel and Distributed Processing, 1999

1998
Adaptive parallelism in compiler-parallelized code.
Concurrency - Practice and Experience, 1998

A Case for Combining Compile-Time and Run-Time Parallelization.
Proceedings of the Languages, 1998

Measuring the Effectiveness of Automatic Parallelization in SUIF.
Proceedings of the 12th international conference on Supercomputing, 1998

Predicated Array Data-flow Analysis for Run-time Parallelization.
Proceedings of the 12th international conference on Supercomputing, 1998

1996
Characterizing the Memory Behavior of Compiler-Parallelized Applications.
IEEE Trans. Parallel Distrib. Syst., 1996

Multiprocessors from a software perspective.
IEEE Micro, 1996

Interprocedural Compilation on Fortran D.
J. Parallel Distrib. Comput., 1996

Memory Referencing Behavior in Compiler-Parallelized Applications.
International Journal of Parallel Programming, 1996

Maximizing Multiprocessor Performance with the SUIF Compiler.
IEEE Computer, 1996

1995
Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Interprocedural Parallelization Analysis: A Case Study.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

Interprocedural Analysis for Parallelization.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

Evaluating the impact of advanced memory systems on compiler-parallelized codes.
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

1994
SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers.
SIGPLAN Notices, 1994

1993
A Methodology for Procedure Cloning.
Comput. Lang., 1993

Experiences Using the ParaScope Editor: an Interactive Parallel Programming Tool.
Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1993

FIAT: A Framework for Interprocedural Analysis and Transfomation.
Proceedings of the Languages and Compilers for Parallel Computing, 1993

1992
Efficient Call Graph Analysis.
LOPLAS, 1992

Unexpected Side Effects of Inline Substitution: A Case Study.
LOPLAS, 1992

Interprocedural Compilation of Fortran D for MIMD Distributed-Memory Machines.
Proceedings of the Proceedings Supercomputing '92, 1992

Procedure cloning.
Proceedings of the ICCL'92, 1992

1991
An Experiment with Inline Substitution.
Softw., Pract. Exper., 1991

Interprocedural transformations for parallel code generation.
Proceedings of the Proceedings Supercomputing '91, 1991

1990
Constructing the Procedure Call Multigraph.
IEEE Trans. Software Eng., 1990


  Loading...