Amit Sabne

Phitchaya Mangpo Phothilimthana

Charith Mendis

Proc. ACM Program. Lang., 2025

2023

Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models.

[BibT_eX]

[DOI]

Karthik Srinivasa Murthy

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2021

A Learned Performance Model for Tensor Processing Units.

[BibT_eX]

[DOI]

Samuel J. Kaufman

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers.

[BibT_eX]

[DOI]

Phitchaya Mangpo Phothilimthana

Nikhil Sarda

Karthik Srinivasa Murthy

Yanqi Zhou

Christof Angermueller

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020

Logic Synthesis of Approximate Circuits.

[BibT_eX]

[DOI]

Swagath Venkataramani

Vivek Joy Kozhikkottu

Kaushik Roy

Anand Raghunathan

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Fast Distributed Bandits for Online Recommendation Systems.

[BibT_eX]

[DOI]

CoRR, 2020

Fast distributed bandits for online recommendation systems.

[BibT_eX]

[DOI]

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

2019

Pagoda: A GPU Runtime System for Narrow Tasks.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2019

RegDem: Increasing GPU Performance via Shared Memory Register Spilling.

[BibT_eX]

[DOI]

CoRR, 2019

Comparative analysis of coprocessors.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2019

Optimizing GPU programs by register demotion: poster.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

2017

Massively parallel 3D image reconstruction.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Model-based Iterative CT Image Reconstruction on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

2016

Programming models, compilers, and runtime systems for accelerator computing

[BibT_eX]

[DOI]

PhD thesis, 2016

High performance model based image reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Formalizing Structured Control Flow Graphs.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2016

POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Understanding Portability of a High-Level Programming Model on Contemporary Heterogeneous Architectures.

[BibT_eX]

[DOI]

IEEE Micro, 2015

HYDRA : Extending Shared Address Programming for Accelerator Clusters.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2015

HeteroDoop: A MapReduce Programming System for Accelerator Clusters.

[BibT_eX]

[DOI]

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

2014

Evaluating Performance Portability of OpenACC.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2014

2013

Scaling large-data computations on multi-GPU accelerators.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

2012

Effects of Compiler Optimizations in OpenMP to CUDA Translation.

[BibT_eX]

[DOI]

Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012

SALSA: systematic logic synthesis of approximate circuits.

[BibT_eX]

[DOI]

Swagath Venkataramani