J. Ramanujam

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

Using HPX and OP2 for Improving Parallel Scaling Performance of Unstructured Grid Applications.

[BibT_eX]

[DOI]

Zahra Khatami

Hartmut Kaiser

Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

2015

Introduction to the Special Issue on PPoPP'12.

[BibT_eX]

[DOI]

Keshav Pingali

ACM Trans. Parallel Comput., 2015

GeauxDock: A novel approach for mixed-resolution ligand docking using a descriptor-based force field.

[BibT_eX]

[DOI]

Yun Ding

Ye Fang

Wei Pan Feinstein

J. Comput. Chem., 2015

SDSLc: a multi-target domain-specific compiler for stencil computations.

[BibT_eX]

[DOI]

Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2015

Lost in heterogeneity: architectural selection based on code features.

[BibT_eX]

[DOI]

Sameer AbuAsal

R. Tohid

Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, 2015

Distributed memory code generation for mixed Irregular/Regular computations.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

On Characterizing the Data Access Complexity of Programs.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2015

Optimistic Delinearization of Parametrically Sized Arrays.

[BibT_eX]

[DOI]

Tobias Grosser

Louis-Noël Pouchet

Sebastian Pop

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

2014

Automatic parallelization of a class of irregular loops for distributed memory systems.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2014

Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly.

[BibT_eX]

[DOI]

Fabio Luporini

Ana Lucia Varbanescu

Florian Rathgeber

Gheorghe-Teodor Bercea

David A. Ham

ACM Trans. Archit. Code Optim., 2014

On Using the Roofline Model with Lower Bounds on Data Movement.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2014

Introduction to the JPDC Special Issue on Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2014

Parallel tempering simulation of the three-dimensional Edwards-Anderson model with compact asynchronous multispin coding on GPU.

[BibT_eX]

[DOI]

Mark Jarrell

Comput. Phys. Commun., 2014

COFFEE: an Optimizing Compiler for Finite Element Local Assembly.

[BibT_eX]

[DOI]

Fabio Luporini

Ana Lucia Varbanescu

Florian Rathgeber

Gheorghe-Teodor Bercea

David A. Ham

CoRR, 2014

DA-TC: a novel application execution model in multicluster systems.

[BibT_eX]

[DOI]

Clust. Comput., 2014

On characterizing the data movement complexity of computational DAGs for parallel execution.

[BibT_eX]

[DOI]

Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014

A framework for enhancing data reuse via associative reordering.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Generalizing Run-Time Tiling with the Loop Chain Abstraction.

[BibT_eX]

[DOI]

Michelle Mills Strout

Fabio Luporini

Christopher D. Krieger

Carlo Bertolli

Gheorghe-Teodor Bercea

Catherine Olschanowsky

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013

Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

Adaptive parallel tiled code generation and accelerated auto-tuning.

[BibT_eX]

[DOI]

Sanket Tavarageri

Athanasios Konstantinidis

Int. J. High Perform. Comput. Appl., 2013

Parametric GPU Code Generation for Affine Loop Programs.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2013

A stencil compiler for short-vector SIMD architectures.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

Split tiling for GPUs: automatic parallelization using trapezoidal tiles.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

2012

An ILP solution to address code generation for embedded applications on digital signal processors.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2012

Storage Optimization through Offset Assignment with Variable Coalescing.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2012

An Effective Solution to Task Scheduling and Memory Partitioning for Multiprocessor System-on-Chip.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2012

Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions.

[BibT_eX]

[DOI]

Qingda Lu

Gerald Baumgartner

J. Parallel Distributed Comput., 2012

Code Size Reduction for Array Intensive Applications on Digital Signal Processors.

[BibT_eX]

[DOI]

J. Circuits Syst. Comput., 2012

Code generation for parallel execution of a class of irregular loops on distributed memory systems.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Analytical Bounds for Optimal Tile Size Selection.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction - 21st International Conference, 2012

2011

Loop transformations: convexity, pruning and optimization.

[BibT_eX]

[DOI]

Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011

Dynamic selection of tile sizes.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on High Performance Computing, 2011

Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction - 20th International Conference, 2011

2010

Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

DynTile: Parametric tiled loop generation for parallel execution on multicore processors.

[BibT_eX]

[DOI]

Ponnuswamy Sadayappan

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Parameterized tiling revisited.

[BibT_eX]

[DOI]

Proceedings of the CGO 2010, 2010

Automatic C-to-CUDA Code Generation for Affine Programs.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction, 19th International Conference, 2010

2009

Decoupling interaction hardware design using libraries of reusable electronics.

[BibT_eX]

[DOI]

Rajesh Sankaran

Brygg Ullmer

Proceedings of the 3rd International Conference on Tangible and Embedded Interaction 2009, 2009

Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors.

[BibT_eX]

[DOI]

Nagavijayalakshmi Vydyanathan

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Parametric multi-level tiling of imperfectly nested loops.

[BibT_eX]

[DOI]

Cédric Bastoul

Albert Cohen

Boyana Norris

Proceedings of the 23rd international conference on Supercomputing, 2009

A Framework for Task Scheduling and Memory Partitioning for Multi-Processor System-on-Chip.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2009

An innovative application execution toolkit for multicluster grids.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the PACT 2009, 2009

2008

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories.

[BibT_eX]

[DOI]

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

A practical automatic polyhedral parallelizer and locality optimizer.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008

Towards effective automatic parallelization for multicore systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A compiler framework for optimization of affine loop nests for gpgpus.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Scheduling DAGs for Fixed-point DSP Processors by Using Worm Partitions.

[BibT_eX]

[DOI]

Jinpyo Hong

Proceedings of the International Conference on Embedded Software and Systems, 2008

Address Register Allocation in Digital Signal Processors.

[BibT_eX]

[DOI]

Jinpyo Hong

Proceedings of the International Conference on Embedded Software and Systems, 2008

Storage optimization through code size reduction for digital signal processors.

[BibT_eX]

[DOI]

Proceedings of the 6th IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2008

Optimal address register allocation for arrays in DSP applications.

[BibT_eX]

[DOI]

Proceedings of the 6th IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2008

Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction, 17th International Conference, 2008

2007

Code Size Optimization for Embedded Processors using Commutative Transformations.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007), 2007

Automatic mapping of nested loops to FPGAS.

[BibT_eX]

[DOI]

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Effective automatic parallelization of stencil computations.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007

Memory Offset Assignment for DSPs.

[BibT_eX]

[DOI]

Jinpyo Hong

Proceedings of the Embedded Software and Systems, [Third] International Conference, 2007

2006

Estimating and reducing the memory requirements of signal processing codes for embedded systems.

[BibT_eX]

[DOI]

IEEE Trans. Signal Process., 2006

Improving the energy behavior of block buffering using compiler optimizations.

[BibT_eX]

[DOI]

Ugur Sezer

ACM Trans. Design Autom. Electr. Syst., 2006

Reducing code size through address register assignment.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2006

An Effective Heuristic for Simple Offset Assignment with Variable Coalescing.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2006

Memory minimization for tensor contractions using integer linear programming.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations.

[BibT_eX]

[DOI]

Qingda Lu

Proceedings of the Computational Science, 2006

2005

Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models.

[BibT_eX]

[DOI]

Proc. IEEE, 2005

Performance modeling and optimization of parallel out-of-core tensor contractions.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Efficient Search-Space Pruning for Integrated Fusion and Tiling Transformations.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2005

Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations.

[BibT_eX]

[DOI]

Proceedings of the Computational Science, 2005

2004

A compiler-based approach for dynamically managing scratch-pad memories in embedded systems.

[BibT_eX]

[DOI]

Mary Jane Irwin

Narayanan Vijaykrishnan

Ismail Kadayif

Amisha Parikh

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

Empirical Performance-Model Driven Data Layout Optimization.

[BibT_eX]

[DOI]

Qingda Lu

Gerald Baumgartner

Proceedings of the Languages and Compilers for High Performance Computing, 2004

Efficient Synthesis of Out-of-Core Algorithms Using a Nonlinear Optimization Solver.

[BibT_eX]

[DOI]

Sandhya Krishnan

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

2003

Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2003

Memory-Constrained Data Locality Optimization for Tensor Contractions.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2003

Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms.

[BibT_eX]

[DOI]

Sandhya Krishnan

Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003

Address Register Assignment for Reducing Code Size.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction, 12th International Conference, 2003

2002

An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets.

[BibT_eX]

[DOI]

J. Supercomput., 2002

Address Code and Arithmetic Optimizations for Embedded Systems.

[BibT_eX]

[DOI]

Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

A Heuristic for Clock Selection in High-Level Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

Strategies for Improving Data Locality in Embedded Applications.

[BibT_eX]

[DOI]

Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

A high-level approach to synthesis of high-performance codes for quantum chemistry.

[BibT_eX]

[DOI]

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations.

[BibT_eX]

[DOI]

Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002

Memory-Constrained Communication Minimization for a Class of Array Computations.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

A Performance Optimization Framework for Compilation of Tensor Contraction Expressions into Parallel Programs.

[BibT_eX]

[DOI]

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Exploiting shared scratch pad memory space in embedded multiprocessor systems.

[BibT_eX]

[DOI]

Proceedings of the 39th Design Automation Conference, 2002

Automatic Data Distribution.

[BibT_eX]

[DOI]

Proceedings of the Compiler Design Handbook: Optimizations and Machine Code Generation, 2002

2001

Static and Dynamic Locality Optimizations Using Integer Linear Programming.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2001

A fast approach to computing exact solutions to the resource-constrained scheduling problem.

[BibT_eX]

[DOI]

M. Narasimhan

ACM Trans. Design Autom. Electr. Syst., 2001

Compact and efficient code generation through program restructuringon limited memory embedded DSPs.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

A Layout-Conscious Iteration Space Transformation Technique.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2001

Morphable Cache Architectures: Potential Benefits.

[BibT_eX]

[DOI]

Ismail Kadayif

Narayanan Vijaykrishnan

Mary Jane Irwin

Proceedings of the 2001 ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems, 2001

Compiler support for block buffering.

[BibT_eX]

[DOI]

Ugur Sezer

Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001

Loop optimization for a class of memory-constrained computations.

[BibT_eX]

[DOI]

Proceedings of the 15th international conference on Supercomputing, 2001

Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - HiPC 2001, 8th International Conference, 2001

Reducing Memory Requirements of Nested Loops for Embedded Systems.

[BibT_eX]

[DOI]

Proceedings of the 38th Design Automation Conference, 2001

Dynamic Management of Scratch-Pad Memory Space.

[BibT_eX]

[DOI]

Mary Jane Irwin

Narayanan Vijaykrishnan

Ismail Kadayif

Amisha Parikh

Proceedings of the 38th Design Automation Conference, 2001

Integer Lattice Based Methods for Local Address Generation for Block-Cyclic Distributions.

[BibT_eX]

[DOI]

Proceedings of the Compiler Optimizations for Scalable Parallel Systems Languages, 2001

2000

A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations.

[BibT_eX]

[DOI]

Meenakshi A. Kandaswamy

IEEE Trans. Parallel Distributed Syst., 2000

Improving Offset Assignment for Embedded Processors.

[BibT_eX]

[DOI]

Sunil Atri

Proceedings of the Languages and Compilers for Parallel Computing, 2000

Improving Offset Assignment on Embedded Processors Using Transformations.

[BibT_eX]

[DOI]

Sunil Atri

Proceedings of the High Performance Computing, 2000

On lower bounds for scheduling problems in high-level synthesis.

[BibT_eX]

[DOI]

M. Narasimhan

Proceedings of the 37th Conference on Design Automation, 2000

Data Relation Vectors: A New Abstraction for Data Optimizations.

[BibT_eX]

[DOI]

Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999

A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1999

A global communication optimization technique based on data-flow analysis and linear algebra.

[BibT_eX]

[DOI]

ACM Trans. Program. Lang. Syst., 1999

Improving Cache Locality by a Combination of Loop and Data Transformation.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1999

A Matrix-Based Approach to Global Locality Optimization.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1999

Improving Locality Using a Graph-Based Technique for Detecting Memory Layouts of Arrays.

[BibT_eX]

Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Code Restructuring for Improving Real Time Response through Code Speed, Size Trade-offs on Limited Memory Embedded DSPs.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1999

A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality.

[BibT_eX]

[DOI]

Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

An integer linear programming approach for optimizing cache locality.

[BibT_eX]

[DOI]

Proceedings of the 13th international conference on Supercomputing, 1999

A Framework for Interprocedural Locality Optimization Using Both Loop and Data Layout Transformations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing 1999, 1999

Compiler Optimizations for I/O-Intensive Computations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing 1999, 1999

Restructuring I/O-Intensive Computations for Locality.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

I/O-Conscious Tiling for Disk-Resident Data Sets.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

On Reducing False Sharing while Improving Locality on Shared Memory Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998

Compilation Techniques for Out-of-Core Parallel Computations.

[BibT_eX]

[DOI]

Parallel Comput., 1998

Locality Optimization Algorithms for Compilation of Out-of-Core Codes.

[BibT_eX]

[DOI]

Meenakshi A. Kandaswamy

J. Inf. Sci. Eng., 1998

Partitioning Graphs on Message-Passing Machines by Pairwise Mincut.

[BibT_eX]

[DOI]

Fikret Erçal

Inf. Sci., 1998

Improving Locality Using Loop and Data Transformations in an Integrated Framework.

[BibT_eX]

[DOI]

Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Improving Locality in Out-of-Core Computations Using Data Layout Transformations.

[BibT_eX]

[DOI]

Proceedings of the Languages, 1998

A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1998

A Generalized Framework for Global Communication Optimization.

[BibT_eX]

[DOI]

Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests.

[BibT_eX]

[DOI]

Proceedings of the 12th international conference on Supercomputing, 1998

Minimizing Data and Synchronization Costs in One-Way Communication.

[BibT_eX]

[DOI]

Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Improving the computational performance of ILP-based problems.

[BibT_eX]

[DOI]

M. Narasimhan

Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design, 1998

Efficient address sequence generation for two-level mappings in High Performance Fortran.

[BibT_eX]

[DOI]

Swaroop Dutta

Proceedings of the 5th International Conference On High Performance Computing, 1998

Enhancing Spatial Locality via Data Layout Optimizations.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '98 Parallel Processing, 1998

A Matrix-Based Approach to the Global Locality Optimization Problem.

[BibT_eX]

[DOI]

Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997

Communication Generation for Block-Cyclic Distributions.

[BibT_eX]

[DOI]

Parallel Process. Lett., 1997

Code Generation for Complex Subscripts in Data-Parallel Programs.

[BibT_eX]

[DOI]

Swaroop Dutta

Proceedings of the Languages and Compilers for Parallel Computing, 1997

A Unified Compiler Algorithm for Optimizing Locality, Parallelism and Communication in Out-of-core Computations.

[BibT_eX]

[DOI]

Meenakshi A. Kandaswamy

Proceedings of the Fifth Workshop on I/O in Parallel and Distributed Systems, 1997

A Compiler Algorithm for Optimizing Locality in Loop Nests.

[BibT_eX]

[DOI]

Proceedings of the 11th international conference on Supercomputing, 1997

Improving the Performance of Out-of-Core Computations.

[BibT_eX]

[DOI]

Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

Optimization of Out-of-Core Computations Using Chain Vectors.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '97 Parallel Processing, 1997

Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed Memory Machines.

[BibT_eX]

[DOI]

Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

1996

A neural architecture for a class of abduction problems.

[BibT_eX]

[DOI]

Ashok K. Goel

IEEE Trans. Syst. Man Cybern. Part B, 1996

Efficient Algorithms for Array Redistribution.

[BibT_eX]

[DOI]

Rajeev Thakur

IEEE Trans. Parallel Distributed Syst., 1996

Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1996

Compilation and Communication Strategies for Out-of-Core Programs on Distributed Memory Machines.

[BibT_eX]

[DOI]

Rajesh Bordawekar

J. Parallel Distributed Comput., 1996

Generalized Overlap Regions for Communication Optimization in Data-Parallel Programs.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1996

Automatic Optimization of Communication in Compiling Out-of-Core Stencil Codes.

[BibT_eX]

[DOI]

Rajesh Bordawekar

Proceedings of the 10th international conference on Supercomputing, 1996

A Framework for Integrated Communication and I/O Placement.

[BibT_eX]

[DOI]

Rajesh Bordawekar

Proceedings of the Euro-Par '96 Parallel Processing, 1996

1995

Beyond unimodular transformations.

[BibT_eX]

[DOI]

J. Supercomput., 1995

Mapping combinatorial optimization problems onto neural networks.

[BibT_eX]

[DOI]

Inf. Sci., 1995

Integrating Data Distribution and Loop Transformations.

[BibT_eX]

Amit Narayan

Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

Communication Generation and Optimization for HPF.

[BibT_eX]

[DOI]

Proceedings of the Languages, 1995

Fast Address Sequence Generation for Data-Parallel Programs Using Integer Lattices.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1995

Statement-level independent partitioning of uniform recurrences.

[BibT_eX]

[DOI]

S. Vasanthakumar

Proceedings of IPPS '95, 1995

Multi-phase array redistribution: modeling and evaluation.

[BibT_eX]

[DOI]

Proceedings of IPPS '95, 1995

1994

Analysis of Event Synchronization in Parallel Programs.

[BibT_eX]

[DOI]

Ashvin Mathew

Proceedings of the Languages and Compilers for Parallel Computing, 1994

Optimal Software Pipelining of Nested Loops.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Parallel Processing, 1994

1992

Tiling Multidimensional Itertion Spaces for Multicomputers.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1992

Non-Unimodular Transformations of Nested Loops.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '92, 1992

1991

Compile-Time Techniques for Data Distribution in Distributed Memory Machines.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1991

Tiling multidimensional iteration spaces for nonshared memory machines.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '91, 1991

A Linear Algebraic View of Loop Transformations and Their Interaction.

[BibT_eX]

Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, 1991

1990

Cluster partitioning approaches to mapping parallel programs onto a hypercube.

[BibT_eX]

[DOI]

Fikret Erçal

Parallel Comput., 1990

Tiling of Iteration Spaces for Multicomputers.

[BibT_eX]

Proceedings of the 1990 International Conference on Parallel Processing, 1990

1989

A methodology for parallelizing programs for multicomputers and complex memory multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

1988

Optimization by neural networks.

[BibT_eX]

[DOI]

Proceedings of International Conference on Neural Networks (ICNN'88), 1988

Towards a 'neural' architecture for abductive reasoning.

[BibT_eX]

[DOI]

Ashok K. Goel

Proceedings of International Conference on Neural Networks (ICNN'88), 1988

Task allocation onto a hypercube by recursive mincut bipartitioning.

[BibT_eX]

[DOI]

Fikret Erçal