Uday Bondhugula

CoRR, 2023

HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description.

[BibT_eX]

[DOI]

Kingshuk Majumder

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

Treebeard: An Optimizing Compiler for Decision Tree Based ML Inference.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

MLIR-based code generation for GPU tensor cores.

[BibT_eX]

[DOI]

Navdeep Katel

Vivek Khandelwal

Proceedings of the CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction, Seoul, South Korea, April 2, 2022

2021

High Performance GPU Code Generation for Matrix-Matrix Multiplication using MLIR: Some Early Results.

[BibT_eX]

[DOI]

Navdeep Katel

Vivek Khandelwal

CoRR, 2021

A practical tile size selection model for affine loop nests.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

MLIR: Scaling Compiler Infrastructure for Domain Specific Computation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

An Effective Fusion and Tile Size Model for PolyMage.

[BibT_eX]

[DOI]

Abhinav Jangda

ACM Trans. Program. Lang. Syst., 2020

Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems.

[BibT_eX]

[DOI]

Karan Aggarwal

ACM Trans. Parallel Comput., 2020

Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2020

High Performance Code Generation in MLIR: An Early Case Study with GEMM.

[BibT_eX]

[DOI]

CoRR, 2020

MLIR: A Compiler Infrastructure for the End of Moore's Law.

[BibT_eX]

[DOI]

CoRR, 2020

Bitwidth customization in image processing pipelines using interval analysis and SMT solvers.

[BibT_eX]

[DOI]

Proceedings of the CC '20: 29th International Conference on Compiler Construction, 2020

2019

A flexible FPGA accelerator for convolutional neural networks.

[BibT_eX]

[DOI]

Kingshuk Majumder

CoRR, 2019

Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems.

[BibT_eX]

[DOI]

CoRR, 2019

Optimizing the linear fascicle evaluation algorithm for many-core systems.

[BibT_eX]

[DOI]

Karan Aggarwal

Proceedings of the ACM International Conference on Supercomputing, 2019

2018

An Approach for Finding Permutations Quickly: Fusion and Dimension matching.

[BibT_eX]

[DOI]

CoRR, 2018

Synthesizing Power and Area Efficient Image Processing Pipelines on FPGAs using Customized Bit-widths.

[BibT_eX]

[DOI]

CoRR, 2018

An effective fusion and tile size model for optimizing image processing pipelines.

[BibT_eX]

[DOI]

Abhinav Jangda

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Polyhedral auto-transformation with no integer linear programming.

[BibT_eX]

[DOI]

Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

2017

Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations.

[BibT_eX]

[DOI]

Vinayaka Bandishti

Irshad Pananilath

IEEE Trans. Parallel Distributed Syst., 2017

Optimizing geometric multigrid method computation using a DSL approach.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

2016

The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests.

[BibT_eX]

[DOI]

Somashekaracharya G. Bhaskaracharya

ACM Trans. Program. Lang. Syst., 2016

Automatic Storage Optimization for Arrays.

[BibT_eX]

[DOI]

ACM Trans. Program. Lang. Syst., 2016

Compiling Affine Loop Nests for a Dynamic Scheduling Runtime on Shared and Distributed Memory.

[BibT_eX]

[DOI]

Roshan Dathathri

Ravi Teja Mullapudi

Somashekaracharya G. Bhaskaracharya

ACM Trans. Parallel Comput., 2016

SMO: an integrated approach to intra-array and inter-array storage optimization.

[BibT_eX]

[DOI]

Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016

A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2015

PLUTO+: near-complete modeling of affine transformations for parallelism and locality.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

PolyMage: Automatic Optimization for Image Processing Pipelines.

[BibT_eX]

[DOI]

Ravi Teja Mullapudi

Vinay Vasista

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014

Effective automatic computation placement and dataallocation for parallelization of regular programs.

[BibT_eX]

[DOI]

Chandan Reddy

Proceedings of the 2014 International Conference on Supercomputing, 2014

Tiling and optimizing time-iterated computations on periodic domains.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Automatic data allocation and buffer management for multi-GPU machines.

[BibT_eX]

[DOI]

Thejas Ramashekar

ACM Trans. Archit. Code Optim., 2013

Compiling affine loop nests for distributed-memory parallel architectures.

[BibT_eX]

[DOI]

Somashekaracharya G. Bhaskaracharya

Proceedings of the International Conference for High Performance Computing, 2013

PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction - 22nd International Conference, 2013

Generating efficient data movement code for heterogeneous architectures with distributed-memory.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

Tiling stencil computations to maximize parallelism.

[BibT_eX]

[DOI]

Vinayaka Bandishti

Irshad Pananilath

Proceedings of the SC Conference on High Performance Computing Networking, 2012

2011

Loop transformations: convexity, pruning and optimization.

[BibT_eX]

[DOI]

Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011

2010

Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Believe it or not!: mult-core CPUs can match GPU performance for a FLOP-intensive application!

[BibT_eX]

[DOI]

Rajesh Bordawekar

Ravi Rao

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

A model for fusion and code motion in an automatic parallelizing compiler.

[BibT_eX]

[DOI]

Lakshminarayanan Renganarayanan

Oktay Günlük

Sanjeeb Dash

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

Compact multi-dimensional kernel extraction for register tiling.

[BibT_eX]

[DOI]

Lakshminarayanan Renganarayanan

Alexandre E. Eichenberger

Salem Derisavi

Kevin O'Brien

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors.

[BibT_eX]

[DOI]

Nagavijayalakshmi Vydyanathan

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the PACT 2009, 2009

2008

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories.

[BibT_eX]

[DOI]

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

A practical automatic polyhedral parallelizer and locality optimizer.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008

Towards effective automatic parallelization for multicore systems.

[BibT_eX]

[DOI]

Albert Hartono

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A compiler framework for optimization of affine loop nests for gpgpus.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction, 17th International Conference, 2008

2007

Automatic mapping of nested loops to FPGAS.

[BibT_eX]

[DOI]

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Effective automatic parallelization of stencil computations.

[BibT_eX]

[DOI]