Amit Sabne

According to our database1, Amit Sabne authored at least 24 papers between 2010 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2021
A Learned Performance Model for Tensor Processing Units.
Proceedings of Machine Learning and Systems 2021, 2021

A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020
Logic Synthesis of Approximate Circuits.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Fast Distributed Bandits for Online Recommendation Systems.
CoRR, 2020

Fast distributed bandits for online recommendation systems.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

2019
Pagoda: A GPU Runtime System for Narrow Tasks.
ACM Trans. Parallel Comput., 2019

RegDem: Increasing GPU Performance via Shared Memory Register Spilling.
CoRR, 2019

Comparative analysis of coprocessors.
Concurr. Comput. Pract. Exp., 2019

Optimizing GPU programs by register demotion: poster.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

2017
Massively parallel 3D image reconstruction.
Proceedings of the International Conference for High Performance Computing, 2017

Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Model-based Iterative CT Image Reconstruction on GPUs.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

2016
High performance model based image reconstruction.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Formalizing Structured Control Flow Graphs.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Understanding Portability of a High-Level Programming Model on Contemporary Heterogeneous Architectures.
IEEE Micro, 2015

HYDRA : Extending Shared Address Programming for Accelerator Clusters.
Proceedings of the Languages and Compilers for Parallel Computing, 2015

HeteroDoop: A MapReduce Programming System for Accelerator Clusters.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

2014
Evaluating Performance Portability of OpenACC.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

2013
Scaling large-data computations on multi-GPU accelerators.
Proceedings of the International Conference on Supercomputing, 2013

2012
Effects of Compiler Optimizations in OpenMP to CUDA Translation.
Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012

SALSA: systematic logic synthesis of approximate circuits.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2010
A generic low power scan chain wrapper for designs using scan compression.
Proceedings of the 28th IEEE VLSI Test Symposium, 2010


  Loading...