Saman P. Amarasinghe

Orcid: 0000-0002-7231-7643

Affiliations:
  • MIT, Cambridge, USA


According to our database1, Saman P. Amarasinghe authored at least 163 papers between 1993 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2019, "For contributions to high performance computing on modern hardware platforms, domain-specific languages, and compilation techniques".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
AskIt: Unified Programming Interface for Programming with Large Language Models.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023
Compiler Support for Structured Data.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

D2X: An eXtensible conteXtual Debugger for Modern DSLs.
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

Looplets: A Language for Structured Coiteration.
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

Codon: A Compiler for High-Performance Pythonic Applications and DSLs.
Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, 2023

A Deep Learning Model for Loop Interchange.
Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, 2023

WACO: Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Compilation of dynamic sparse tensor algebra.
Proc. ACM Program. Lang., 2022

All you need is superword-level parallelism: systematic control-flow vectorization with SLP.
Proceedings of the PLDI '22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13, 2022

Autoscheduling for sparse tensor algebra with an asymptotic cost model.
Proceedings of the PLDI '22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13, 2022

Unified Compilation for Lossless Compression and Sparse Computing.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

GraphIt to CUDA Compiler in 2021 LOC: A Case for High-Performance DSL Implementation via Staging with BuilDSL.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

2021
Compilation of sparse array programming models.
Proc. ACM Program. Lang., 2021

Dynamic Sparse Tensor Algebra Compilation.
CoRR, 2021

An Asymptotic Cost Model for Autoscheduling Sparse Tensor Programs.
CoRR, 2021

An Attempt to Generate Code for Symmetric Tensor Computations.
CoRR, 2021

A Deep Learning Based Cost Model for Automatic Code Optimization.
Proceedings of Machine Learning and Systems 2021, 2021

Taming the Zoo: The Unified GraphIt Compiler Framework for Novel Architectures.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

A Deep Dive Into Understanding The Random Walk-Based Temporal Graph Learning.
Proceedings of the IEEE International Symposium on Workload Characterization, 2021

Domain-Specific Language Abstractions for Compression.
Proceedings of the 31st Data Compression Conference, 2021

Compiling Graph Applications for GPU s with GraphIt.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

BuildIt: A Type-Based Multi-stage Programming Framework for Code Generation in C++.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

VeGen: a vectorizer generator for SIMD and beyond.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
A sparse iteration space transformation framework for sparse tensor algebra.
Proc. ACM Program. Lang., 2020

Compliation Techniques for Graphs Algorithms on GPUs.
CoRR, 2020

TIRAMISU: A Polyhedral Compiler for Dense and Sparse Deep Learning.
CoRR, 2020

A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra.
CoRR, 2020

Sparse Tensor Transpositions.
Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020

Automatic generation of efficient sparse tensor format conversion routines.
Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020

Compiler 2.0: Using Machine Learning to Modernize Compiler Technology.
Proceedings of the 21st ACM SIGPLAN/SIGBED International Conference on Languages, 2020

SALSA: A Domain Specific Architecture for Sequence Alignment.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

GrAPL 2020 Keynote Speaker The GraphIt Universal Graph Framework: Achieving HighPerformance across Algorithms, Graph Types, and Architectures.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Optimizing ordered graph algorithms with GraphIt.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019
Seq: a high-performance language for bioinformatics.
Proc. ACM Program. Lang., 2019

PriorityGraph: A Unified Programming Model for Optimizing Ordered Graph Algorithms.
CoRR, 2019

Compiler Auto-Vectorization with Imitation Learning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

BHive: A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks.
Proceedings of the 36th International Conference on Machine Learning, 2019

Accelerated CNN Training through Gradient Approximation.
Proceedings of the 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications, 2019

Tensor Algebra Compilation with Workspaces.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

Revec: program rejuvenation through revectorization.
Proceedings of the 28th International Conference on Compiler Construction, 2019

The sparse tensor algebra compiler (keynote).
Proceedings of the 28th International Conference on Compiler Construction, 2019

2018
Evaluating End-to-End Optimization for Data Analytics Applications in Weld.
Proc. VLDB Endow., 2018

GraphIt: a high-performance graph DSL.
Proc. ACM Program. Lang., 2018

goSLP: globally optimized superword level parallelism framework.
Proc. ACM Program. Lang., 2018

Format abstraction for sparse tensor algebra compilers.
Proc. ACM Program. Lang., 2018

DAWG: A Defense Against Cache Timing Attacks in Speculative Execution Processors.
IACR Cryptol. ePrint Arch., 2018

Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks.
CoRR, 2018

Cimple: Instruction and Memory Level Parallelism.
CoRR, 2018

GraphIt - A High-Performance DSL for Graph Analytics.
CoRR, 2018

Unified Sparse Formats for Tensor Algebra Compilers.
CoRR, 2018

The Three Pillars of Machine-Based Programming.
CoRR, 2018

Automatic Generation of Sparse Tensor Kernels with Workspaces.
CoRR, 2018

Halide: decoupling algorithms from schedules for high-performance image processing.
Commun. ACM, 2018

The three pillars of machine programming.
Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018

Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

A Unified Backend for Targeting FPGAs from DSLs.
Proceedings of the 29th IEEE International Conference on Application-specific Systems, 2018

Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
The tensor algebra compiler.
Proc. ACM Program. Lang., 2017

Weld: Rethinking the Interface Between Data-Intensive Applications.
CoRR, 2017

taco: a tool to generate tensor algebra kernels.
Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, 2017

A Common Backend for Hardware Acceleration on FPGA.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

A Common Runtime for High Performance Data Analysis.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

Making caches work for graph analytics.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016
Simit: A Language for Physical Simulation.
ACM Trans. Graph., 2016

Optimizing Cache Performance for Graph Analytics.
CoRR, 2016

Distributed Halide.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Optimizing Indirect Memory References with milk.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Autotuning algorithmic choice for input sensitivity.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

2014
WOSC 2014: second workshop on optimizing stencil computations.
Proceedings of the SPLASH'14, 2014

StreamJIT: a commensal compiler for high-performance stream programming.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

OpenTuner: an extensible framework for program autotuning.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Detection of false sharing using machine learning.
Proceedings of the International Conference for High Performance Computing, 2013

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

Dynamic expressivity with static optimization for streaming languages.
Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, 2013

Portable performance on heterogeneous architectures.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

2012
Decoupling algorithms from schedules for easy optimization of image processing pipelines.
ACM Trans. Graph., 2012

Transparent dynamic instrumentation.
Proceedings of the 8th International Conference on Virtual Execution Environments, 2012

Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection.
Proceedings of the Applications of Evolutionary Computation, 2012

Siblingrivalry: online autotuning through local competitions.
Proceedings of the 15th International Conference on Compilers, 2012

Aikido: accelerating shared data dynamic analyses.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011
Dynamic cache contention detection in multi-threaded applications.
Proceedings of the 7th International Conference on Virtual Execution Environments, 2011

Multicore Performance Optimization Using Partner Cores.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011

PetaBricks: a language and compiler based on autotuning.
Proceedings of the High Performance Embedded Architectures and Compilers, 2011

An efficient evolutionary algorithm for solving incrementally structured problems.
Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference, 2011

Language and compiler support for auto-tuning variable-accuracy algorithms.
Proceedings of the CGO 2011, 2011

2010
Efficient memory shadowing for 64-bit architectures.
Proceedings of the 9th International Symposium on Memory Management, 2010

Evaluation of IVR data collection UIs for untrained rural users.
Proceedings of the First ACM Annual Symposium on Computing for Development, 2010

Umbra: efficient and scalable memory shadowing.
Proceedings of the CGO 2010, 2010

An empirical characterization of stream programs and its implications for language and compiler design.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

Automatically patching errors in deployed software.
Proceedings of the 22nd ACM Symposium on Operating Systems Principles 2009, 2009

Autotuning multigrid with PetaBricks.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

PetaBricks: a language and compiler for algorithmic choice.
Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009

Manipulating lossless video in the compressed domain.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Computer-aided design for microfluidic chips based on multilayer soft lithography.
Proceedings of the 27th International Conference on Computer Design, 2009

Kendo: efficient deterministic multithreading in software.
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

2008
A lightweight streaming layer for multicore execution.
SIGARCH Comput. Archit. News, 2008

Abstraction layers for scalable microfluidic biocomputing.
Nat. Comput., 2008

How to Do a Million Watchpoints: Efficient Debugging Using Dynamic Instrumentation.
Proceedings of the Compiler Construction, 17th International Conference, 2008

(How) can programmers conquer the multicore menace?
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
A step towards unifying schedule and storage optimization.
ACM Trans. Program. Lang. Syst., 2007

A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Ubiquitous Memory Introspection.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

2006
MPEG-2 decoding in a stream programming language.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Abstraction Layers for Scalable Microfluidic Biocomputers.
Proceedings of the DNA Computing, 12th International Meeting on DNA Computing, 2006

Exploiting coarse-grained task, data, and pipeline parallelism in stream programs.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2005
Scalar Operand Networks.
IEEE Trans. Parallel Distributed Syst., 2005

Interprocedural parallelization analysis in SUIF.
ACM Trans. Program. Lang. Syst., 2005

Language and Compiler Design for Streaming Applications.
Int. J. Parallel Program., 2005

Teleport messaging for distributed stream programs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Exploiting Vector Parallelism in Software Pipelined Loops.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Cache aware optimization of stream programs.
Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, 2005

Predicting Unroll Factors Using Supervised Classification.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Maintaining Consistency and Bounding Capacity of Software Code Caches.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Multicores from the Compiler's Perspective: A Blessing or a Curse?.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Optimizing stream programs using linear state space analysis.
Proceedings of the 2005 International Conference on Compilers, 2005

2004
Convergent Scheduling.
J. Instr. Level Parallelism, 2004

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

2003
Meta optimization: improving compiler heuristics with machine learning.
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

Linear analysis and optimization of stream programs.
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

Phased scheduling of stream programs.
Proceedings of the 2003 Conference on Languages, 2003

Adapting Convergent Scheduling Using Machine-Learning.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Dynamic native optimization of interpreters.
Proceedings of the 2003 Workshop on Interpreters, Virtual Machines and Emulators, 2003

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architecture.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Genetic Programming Applied to Compiler Heuristic Optimization.
Proceedings of the Genetic Programming, 6th European Conference, EuroGP 2003, 2003

An Infrastructure for Adaptive Dynamic Optimization.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002
A common machine language for grid-based architectures.
SIGARCH Comput. Archit. News, 2002

The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs.
IEEE Micro, 2002

Secure Execution via Program Shepherding.
Proceedings of the 11th USENIX Security Symposium, 2002

Defying the speed of light: : a spatially-aware compiler for wire-exposed architectures.
Proceedings of the ACM SIGPLAN ASIA-PEPM 2002, 2002

Convergent scheduling.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Providing Web search capability for low-connectivity communities.
Proceedings of the 2002 International Symposium on Technology and Society, 2002

Efficient Pipelining of Nested Loops: Unroll-and-Squash.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

StreamIt: A Language for Streaming Applications.
Proceedings of the Compiler Construction, 11th International Conference, 2002

A stream compiler for communication-exposed architectures.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

Increasing and Detecting Memory Address Congruence.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001
Compiler Support for Scalable and Efficient Memory Systems.
IEEE Trans. Computers, 2001

A Unified Framework for Schedule and Storage Optimization.
Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2001

Strength Reduction of Integer Division and Modulo Operations.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

2000
Bitwidth analysis with application to silicon compilation.
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000

Exploiting superword level parallelism with multimedia instruction sets.
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000

FlexCache: A Framework for Flexible Compiler Generated Data Caching.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

1999
Maps: A Compiler-Managed Memory System for Raw Machines.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

Parallelizing Applications into Silicon.
Proceedings of the 7th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '99), 1999

1998
Memory bank disambiguation using modulo unrolling for Raw machines.
Proceedings of the 5th International Conference On High Performance Computing, 1998

Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

1997
Baring It All to Software: Raw Machines.
Computer, 1997

1996
Multiprocessors from a software perspective.
IEEE Micro, 1996

Maximizing Multiprocessor Performance with the SUIF Compiler.
Computer, 1996

1995
Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Interprocedural Parallelization Analysis: A Case Study.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

An Overview of the SUIF Compiler for Scalable Parallel Machines.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

Data and Computation Transformations for Multiprocessors.
Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1995

Interprocedural Analysis for Parallelization.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

Unified Compilation Techniques for Shared and Distributed Address Space Machines.
Proceedings of the 9th international conference on Supercomputing, 1995

1994
SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers.
ACM SIGPLAN Notices, 1994

1993
Array Data-Flow Analysis and its Use in Array Privatization.
Proceedings of the Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1993

Communication Optimization and Code Generation for Distributed Memory Machines.
Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation (PLDI), 1993

An Overview of a Compiler for Scalable Parallel Machines.
Proceedings of the Languages and Compilers for Parallel Computing, 1993


  Loading...