Mary W. Hall

Orcid: 0000-0002-3058-7573

Affiliations:
  • University of Utah, Salt Lake City, Utah, USA


According to our database1, Mary W. Hall authored at least 147 papers between 1990 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs.
ACM Trans. Archit. Code Optim., March, 2024

Integrating ytopt and libEnsemble to Autotune OpenMC.
CoRR, 2024

2023
Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration.
ACM Trans. Archit. Code Optim., March, 2023

ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales.
CoRR, 2023

Departmental BPC Plans 2 - Finalizing your Plan: Context, Style, Formatting, and Verification on BPCnet.org.
Proceedings of the 54th ACM Technical Symposium on Computer Science Education, Volume 2, 2023

Departmental BPC Plans 1 - Getting Started: Selecting Goals and Activities for Broadening Participation in Computing.
Proceedings of the 54th ACM Technical Symposium on Computer Science Education, Volume 2, 2023

An NSF REU Site Based on Trust and Reproducibility of Intelligent Computation: Experience Report.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Performance Portability Evaluation of Blocked Stencil Computations on GPUs.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Transfer-learning-based Autotuning using Gaussian Copula.
Proceedings of the 37th International Conference on Supercomputing, 2023

Code Synthesis for Sparse Tensor Format Conversion and Optimization.
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

(De/Re)-Compositions Expressed Systematically via MDH-Based Schedules.
Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, 2023

Efficiently Learning Locality Optimizations by Decomposing Transformation Domains.
Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, 2023

2022
Autotuning PolyBench benchmarks with LLVM Clang/Polly loop optimization pragmas using Bayesian optimization.
Concurr. Comput. Pract. Exp., 2022

Tensor Iterators for Flexible High-Performance Tensor Computation.
Proceedings of the Languages and Compilers for Parallel Computing, 2022

2021
Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version).
CoRR, 2021

Improving communication by optimizing on-node data movement with data layout.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Customized Monte Carlo Tree Search for LLVM/Polly's Composable Loop Optimization Transformations.
Proceedings of the 2021 International Workshop on Performance Modeling, 2021

Predictive data locality optimization for higher-order tensor computations.
Proceedings of the MAPS@PLDI 2021: Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming, 2021

A Roadmap to Robust Science for High-throughput Applications: The Scientists' Perspective.
Proceedings of the 17th IEEE International Conference on eScience, 2021

A Roadmap to Robust Science for High-throughput Applications: The Developers' Perspective.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Data Layout and Data Representation Optimizations to Reduce Data Movement Keynote.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020
Data-driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs.
ACM Trans. Archit. Code Optim., 2020

Research Challenges in Compiler Technology for Sparse Tensors.
Proceedings of the 10th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2020

Expanding Opportunities for Array Privatization in Sparse Computations.
Proceedings of the Languages and Compilers for Parallel Computing, 2020

Optimized Code Generation for Deep Neural Networks.
Proceedings of the Languages and Compilers for Parallel Computing, 2020

High Performance is All about Minimizing Data Movement.
Proceedings of the HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020

2019
SWIRL: High-performance many-core CPU code generation for deep neural networks.
Int. J. High Perform. Comput. Appl., 2019

Smoothing the path to computing: pondering uses for big data.
Commun. ACM, 2019

Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs.
Proceedings of the International Conference for High Performance Computing, 2019

Sparse computation data dependence simplification for efficient compiler-generated inspectors.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

SWIRL++ : Evaluating Performance Models to Guide Code Transformation in Convolutional Neural Networks.
Proceedings of the Languages and Compilers for Parallel Computing, 2019

A Framework for Enabling OpenMP Autotuning.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

Rigel: A Framework for OpenMP PerformanceTuning.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

2018
The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code.
Proc. IEEE, 2018

Autotuning in High-Performance Computing Applications.
Proc. IEEE, 2018

Sparse Matrix Code Dependence Analysis Simplification at Compile Time.
CoRR, 2018

SIMD code generation for stencils on brick decompositions.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

2017
Guest Editor Introduction PPoPP 2016, Special Issue 2 of 2.
ACM Trans. Parallel Comput., 2017

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers.
Parallel Comput., 2017

Reproducing ParConnect for SC16.
Parallel Comput., 2017

Generation CS: the challenges of and responses to the enrollment surge.
Inroads, 2017

Generation CS: the mixed news on diversity and the enrollment surge.
Inroads, 2017

Generation CS: the growth of computer science.
Inroads, 2017

Polyhedral Compilation Support for C++ Features: A Case Study with CPPTRAJ.
Proceedings of the Languages and Compilers for Parallel Computing, 2017

Automating Compiler-Directed Autotuning for Phased Performance Behavior.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016
Designing a Tunable Nested Data-Parallel Programming System.
ACM Trans. Archit. Code Optim., 2016

Compiler Transformation to Generate Hybrid Sparse Computations.
Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

Automating wavefront parallelization for sparse matrix computations.
Proceedings of the International Conference for High Performance Computing, 2016

Polyhedral Compiler Technology in Collaboration with Autotuning Important to Domain-Specific Frameworks for HPC.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

Synchronization Trade-Offs in GPU Implementations of Graph Algorithms.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Architecture-Adaptive Code Variant Tuning.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015
A collection-oriented programming model for performance portability.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Loop and data transformations for sparse matrix code.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Compiler-Directed Transformation for Higher-Order Stencils.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Generating Efficient Tensor Contractions for GPUs.
Proceedings of the 44th International Conference on Parallel Processing, 2015

2014
Practices of PLDI.
ACM SIGPLAN Notices, 2014

Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

Nitro: A Framework for Adaptive Code Variant Tuning.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Non-affine Extensions to Polyhedral Code Generation.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013
A script-based autotuning compiler system to generate high-performance CUDA code.
ACM Trans. Archit. Code Optim., 2013

Towards making autotuning mainstream.
Int. J. High Perform. Comput. Appl., 2013

Rethinking Abstractions for Big Data: Why, Where, How, and What.
CoRR, 2013

Compiler generation and autotuning of communication-avoiding operators for geometric multigrid.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

2012
Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters.
J. Supercomput., 2012

Understanding ACM's past.
Commun. ACM, 2012

Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2011
Domain-Specific Optimization of Signal Recognition Targeting FPGAs.
ACM Trans. Reconfigurable Technol. Syst., 2011

Auto-tuning full applications: A case study.
Int. J. High Perform. Comput. Appl., 2011

Evaluating graph coloring on GPUs.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

EigenCFA: accelerating flow analysis with GPUs.
Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011

Analyzing the effects of compiler optimizations on application reliability.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Parameterized specification, configuration and execution of data-intensive scientific workflows.
Clust. Comput., 2010

A Programming Language Interface to Describe Transformations and Code Generation.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Speeding up Nek5000 with autotuning and specialization.
Proceedings of the 24th International Conference on Supercomputing, 2010

Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology.
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009
Evaluating compiler technology for control-flow optimizations for multimedia extension architectures.
Microprocess. Microsystems, 2009

HPC and Grid Computing for Integrative Biomedical Research.
Int. J. High Perform. Comput. Appl., 2009

Compiler research: the next 50 years.
Commun. ACM, 2009

Loop Transformation Recipes for Code Generation and Auto-Tuning.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

A scalable auto-tuning framework for compiler optimization.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Model-guided autotuning of high-productivity languages for petascale computing.
Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009

An integrated framework for performance-based optimization of scientific workflows.
Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009

Computation reuse in domain-specific optimization of signal recognition.
Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

2008
Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques.
Proc. IEEE, 2008

Model-guided performance tuning of parameter values: A case study with molecular dynamics visualization.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Designing and parameterizing a workflow for optimization: A case study in biomedical imaging.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

The potential of computation reuse in high-level optimization of a signal recognition system.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007
Model-Guided Empirical Optimization for Multimedia Extension Architectures: A Case Study.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Intelligent Optimization of Parallel and Distributed Applications.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A Combined Hardware/Software Optimization Framework for Signal Representation and Recognition.
Proceedings of the Computational Science, 2007

2006
A Wiki for discussing and promoting best practices in research.
Commun. ACM, 2006

An overview of the ECO project.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Processing-in-memory technology for knowledge discovery algorithms.
Proceedings of the Workshop on Data Management on New Hardware, 2006

2005
Interprocedural parallelization analysis in SUIF.
ACM Trans. Program. Lang. Syst., 2005

Automatic mapping of C to FPGAs with the DEFACTO compilation and synthesis system.
Microprocess. Microsystems, 2005

Empirical Optimization for a Sparse Linear Solver: A Case Study.
Int. J. Parallel Program., 2005

A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Evaluating heuristics in automatically mapping multi-loop applications to FPGAs.
Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005

Superword-Level Parallelism in the Presence of Control Flow.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

2004
A Code Isolator: Isolating Code Fragments from Large Programs.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

A Case Study Using Empirical Optimization for a Large, Engineering Application.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Custom Data Layout for Memory Parallelism.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

Increasing the Applicability of Scalar Replacement.
Proceedings of the Compiler Construction, 13th International Conference, 2004

2003
Exploiting Superword-Level Locality in Multimedia Extension Architectures.
J. Instr. Level Parallelism, 2003

Search Space Properties for Mapping Coarse-Grain Pipelined FPGA Applications.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

ECO: An Empirical-Based Compilation and Optimization System.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Compiler-generated communication for pipelined FPGA applications.
Proceedings of the 40th Design Automation Conference, 2003

Using estimates from behavioral synthesis tools in compiler-directed design space exploration.
Proceedings of the 40th Design Automation Conference, 2003

2002
A Compiler Approach to Fast Hardware Design Space Exploration in FPGA-based Systems.
Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002

The architecture of the DIVA processing-in-memory chip.
Proceedings of the 16th international conference on Supercomputing, 2002

Coarse-Grain Pipelining on Multiple FPGA Architectures.
Proceedings of the 10th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2002), 2002

Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001
Bridging the Gap between Compilation and Synthesis in the DEFACTO System.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

2000
Evaluating Automatic Parallelization in SUIF.
IEEE Trans. Parallel Distributed Syst., 2000

Memory Management in a PIM-Based Architecture.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

1999
Combining compile-time and run-time parallelization.
Sci. Program., 1999

Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Evaluation of Predicated Array Data-Flow Analysis for Automatic Parallelization.
Proceedings of the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'99), 1999

DEFACTO: A Design Environment for Adaptive Computing Technology.
Proceedings of the Parallel and Distributed Processing, 1999

1998
Adaptive parallelism in compiler-parallelized code.
Concurr. Pract. Exp., 1998

A Case for Combining Compile-Time and Run-Time Parallelization.
Proceedings of the Languages, 1998

Measuring the Effectiveness of Automatic Parallelization in SUIF.
Proceedings of the 12th international conference on Supercomputing, 1998

Predicated Array Data-flow Analysis for Run-time Parallelization.
Proceedings of the 12th international conference on Supercomputing, 1998

1996
Characterizing the Memory Behavior of Compiler-Parallelized Applications.
IEEE Trans. Parallel Distributed Syst., 1996

Multiprocessors from a software perspective.
IEEE Micro, 1996

Interprocedural Compilation on Fortran D.
J. Parallel Distributed Comput., 1996

Memory Referencing Behavior in Compiler-Parallelized Applications.
Int. J. Parallel Program., 1996

Maximizing Multiprocessor Performance with the SUIF Compiler.
Computer, 1996

1995
Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Interprocedural Parallelization Analysis: A Case Study.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

Interprocedural Analysis for Parallelization.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

Evaluating the impact of advanced memory systems on compiler-parallelized codes.
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

1994
SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers.
ACM SIGPLAN Notices, 1994

1993
The ParaScope parallel programming environment.
Proc. IEEE, 1993

A Methodology for Procedure Cloning.
Comput. Lang., 1993

Experiences Using the ParaScope Editor: an Interactive Parallel Programming Tool.
Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1993

FIAT: A Framework for Interprocedural Analysis and Transfomation.
Proceedings of the Languages and Compilers for Parallel Computing, 1993

1992
Efficient Call Graph Analysis.
LOPLAS, 1992

Unexpected Side Effects of Inline Substitution: A Case Study.
LOPLAS, 1992

Interprocedural Compilation of Fortran D for MIMD Distributed-Memory Machines.
Proceedings of the Proceedings Supercomputing '92, 1992

Procedure cloning.
Proceedings of the ICCL'92, 1992

1991
An Experiment with Inline Substitution.
Softw. Pract. Exp., 1991

Interprocedural transformations for parallel code generation.
Proceedings of the Proceedings Supercomputing '91, 1991

1990
Constructing the Procedure Call Multigraph.
IEEE Trans. Software Eng., 1990


  Loading...