Uday Bondhugula

Orcid: 0000-0002-8297-6159

According to our database1, Uday Bondhugula authored at least 54 papers between 2005 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Automatic multi-dimensional pipelining for high-level synthesis of dataflow accelerators.
CoRR, 2023

HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Treebeard: An Optimizing Compiler for Decision Tree Based ML Inference.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

MLIR-based code generation for GPU tensor cores.
Proceedings of the CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction, Seoul, South Korea, April 2, 2022

2021
High Performance GPU Code Generation for Matrix-Matrix Multiplication using MLIR: Some Early Results.
CoRR, 2021

A practical tile size selection model for affine loop nests.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

MLIR: Scaling Compiler Infrastructure for Domain Specific Computation.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020
An Effective Fusion and Tile Size Model for PolyMage.
ACM Trans. Program. Lang. Syst., 2020

Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems.
ACM Trans. Parallel Comput., 2020

Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs.
ACM Trans. Archit. Code Optim., 2020

High Performance Code Generation in MLIR: An Early Case Study with GEMM.
CoRR, 2020

MLIR: A Compiler Infrastructure for the End of Moore's Law.
CoRR, 2020

Bitwidth customization in image processing pipelines using interval analysis and SMT solvers.
Proceedings of the CC '20: 29th International Conference on Compiler Construction, 2020

2019
A flexible FPGA accelerator for convolutional neural networks.
CoRR, 2019

Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems.
CoRR, 2019

Optimizing the linear fascicle evaluation algorithm for many-core systems.
Proceedings of the ACM International Conference on Supercomputing, 2019

2018
An Approach for Finding Permutations Quickly: Fusion and Dimension matching.
CoRR, 2018

Synthesizing Power and Area Efficient Image Processing Pipelines on FPGAs using Customized Bit-widths.
CoRR, 2018

An effective fusion and tile size model for optimizing image processing pipelines.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Polyhedral auto-transformation with no integer linear programming.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

2017
Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations.
IEEE Trans. Parallel Distributed Syst., 2017

Optimizing geometric multigrid method computation using a DSL approach.
Proceedings of the International Conference for High Performance Computing, 2017

2016
The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests.
ACM Trans. Program. Lang. Syst., 2016

Automatic Storage Optimization for Arrays.
ACM Trans. Program. Lang. Syst., 2016

Compiling Affine Loop Nests for a Dynamic Scheduling Runtime on Shared and Distributed Memory.
ACM Trans. Parallel Comput., 2016

SMO: an integrated approach to intra-array and inter-array storage optimization.
Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016

A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations.
ACM Trans. Archit. Code Optim., 2015

PLUTO+: near-complete modeling of affine transformations for parallelism and locality.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

PolyMage: Automatic Optimization for Image Processing Pipelines.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Effective automatic computation placement and dataallocation for parallelization of regular programs.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Tiling and optimizing time-iterated computations on periodic domains.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Automatic data allocation and buffer management for multi-GPU machines.
ACM Trans. Archit. Code Optim., 2013

Compiling affine loop nests for distributed-memory parallel architectures.
Proceedings of the International Conference for High Performance Computing, 2013

PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language.
Proceedings of the Compiler Construction - 22nd International Conference, 2013

Generating efficient data movement code for heterogeneous architectures with distributed-memory.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Tiling stencil computations to maximize parallelism.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

2011
Loop transformations: convexity, pruning and optimization.
Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011

2010
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework.
Proceedings of the Conference on High Performance Computing Networking, 2010

Believe it or not!: mult-core CPUs can match GPU performance for a FLOP-intensive application!
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

A model for fusion and code motion in an automatic parallelizing compiler.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Compact multi-dimensional kernel extraction for register tiling.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors.
Proceedings of the PACT 2009, 2009

2008
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

A practical automatic polyhedral parallelizer and locality optimizer.
Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008

Towards effective automatic parallelization for multicore systems.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A compiler framework for optimization of affine loop nests for gpgpus.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model.
Proceedings of the Compiler Construction, 17th International Conference, 2008

2007
Automatic mapping of nested loops to FPGAS.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Effective automatic parallelization of stencil computations.
Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007

2006
Parallel FPGA-based all-pairs shortest-paths in a directed graph.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths.
Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006

2005
High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters.
Proceedings of the High Performance Computing, 2005


  Loading...