Saman P. Amarasinghe

According to our database1, Saman P. Amarasinghe authored at least 118 papers between 1993 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2019
Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks.
Proceedings of the 36th International Conference on Machine Learning, 2019

Tensor Algebra Compilation with Workspaces.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

Revec: program rejuvenation through revectorization.
Proceedings of the 28th International Conference on Compiler Construction, 2019

The sparse tensor algebra compiler (keynote).
Proceedings of the 28th International Conference on Compiler Construction, 2019

2018
Evaluating End-to-End Optimization for Data Analytics Applications in Weld.
PVLDB, 2018

GraphIt: a high-performance graph DSL.
PACMPL, 2018

goSLP: globally optimized superword level parallelism framework.
PACMPL, 2018

Format abstraction for sparse tensor algebra compilers.
PACMPL, 2018

Halide: decoupling algorithms from schedules for high-performance image processing.
Commun. ACM, 2018

The three pillars of machine programming.
Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018

DAWG: A Defense Against Cache Timing Attacks in Speculative Execution Processors.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

A Unified Backend for Targeting FPGAs from DSLs.
Proceedings of the 29th IEEE International Conference on Application-specific Systems, 2018

Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
The tensor algebra compiler.
PACMPL, 2017

taco: a tool to generate tensor algebra kernels.
Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, 2017

A Common Backend for Hardware Acceleration on FPGA.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

A Common Runtime for High Performance Data Analysis.
Proceedings of the CIDR 2017, 2017

Making caches work for graph analytics.
Proceedings of the 2017 IEEE International Conference on Big Data, BigData 2017, 2017

2016
Simit: A Language for Physical Simulation.
ACM Trans. Graph., 2016

Distributed Halide.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Optimizing Indirect Memory References with milk.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Autotuning algorithmic choice for input sensitivity.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

2014
WOSC 2014: second workshop on optimizing stencil computations.
Proceedings of the Conference on Systems, 2014

StreamJIT: a commensal compiler for high-performance stream programming.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

OpenTuner: an extensible framework for program autotuning.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Detection of false sharing using machine learning.
Proceedings of the International Conference for High Performance Computing, 2013

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

Dynamic expressivity with static optimization for streaming languages.
Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, 2013

Portable performance on heterogeneous architectures.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

2012
Decoupling algorithms from schedules for easy optimization of image processing pipelines.
ACM Trans. Graph., 2012

Transparent dynamic instrumentation.
Proceedings of the 8th International Conference on Virtual Execution Environments, 2012

Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection.
Proceedings of the Applications of Evolutionary Computation, 2012

Siblingrivalry: online autotuning through local competitions.
Proceedings of the 15th International Conference on Compilers, 2012

Aikido: accelerating shared data dynamic analyses.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011
Dynamic cache contention detection in multi-threaded applications.
Proceedings of the 7th International Conference on Virtual Execution Environments, 2011

Multicore Performance Optimization Using Partner Cores.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011

PetaBricks: a language and compiler based on autotuning.
Proceedings of the High Performance Embedded Architectures and Compilers, 2011

An efficient evolutionary algorithm for solving incrementally structured problems.
Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference, 2011

Language and compiler support for auto-tuning variable-accuracy algorithms.
Proceedings of the CGO 2011, 2011

2010
Efficient memory shadowing for 64-bit architectures.
Proceedings of the 9th International Symposium on Memory Management, 2010

Evaluation of IVR data collection UIs for untrained rural users.
Proceedings of the First ACM Annual Symposium on Computing for Development, 2010

Umbra: efficient and scalable memory shadowing.
Proceedings of the CGO 2010, 2010

An empirical characterization of stream programs and its implications for language and compiler design.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

Automatically patching errors in deployed software.
Proceedings of the 22nd ACM Symposium on Operating Systems Principles 2009, 2009

Autotuning multigrid with PetaBricks.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

PetaBricks: a language and compiler for algorithmic choice.
Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009

Manipulating lossless video in the compressed domain.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Computer-aided design for microfluidic chips based on multilayer soft lithography.
Proceedings of the 27th International Conference on Computer Design, 2009

Kendo: efficient deterministic multithreading in software.
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

2008
A lightweight streaming layer for multicore execution.
SIGARCH Computer Architecture News, 2008

Abstraction layers for scalable microfluidic biocomputing.
Natural Computing, 2008

How to Do a Million Watchpoints: Efficient Debugging Using Dynamic Instrumentation.
Proceedings of the Compiler Construction, 17th International Conference, 2008

(How) can programmers conquer the multicore menace?
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
A step towards unifying schedule and storage optimization.
ACM Trans. Program. Lang. Syst., 2007

A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Ubiquitous Memory Introspection.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

2006
MPEG-2 decoding in a stream programming language.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Abstraction Layers for Scalable Microfluidic Biocomputers.
Proceedings of the DNA Computing, 12th International Meeting on DNA Computing, 2006

Exploiting coarse-grained task, data, and pipeline parallelism in stream programs.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2005
Scalar Operand Networks.
IEEE Trans. Parallel Distrib. Syst., 2005

Interprocedural parallelization analysis in SUIF.
ACM Trans. Program. Lang. Syst., 2005

Language and Compiler Design for Streaming Applications.
International Journal of Parallel Programming, 2005

Teleport messaging for distributed stream programs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Exploiting Vector Parallelism in Software Pipelined Loops.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Cache aware optimization of stream programs.
Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, 2005

Predicting Unroll Factors Using Supervised Classification.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Maintaining Consistency and Bounding Capacity of Software Code Caches.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Multicores from the Compiler's Perspective: A Blessing or a Curse?.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Optimizing stream programs using linear state space analysis.
Proceedings of the 2005 International Conference on Compilers, 2005

2004
Convergent Scheduling.
J. Instruction-Level Parallelism, 2004

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Language and Compiler Design for Streaming Applications.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

2003
Meta optimization: improving compiler heuristics with machine learning.
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

Linear analysis and optimization of stream programs.
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

Phased scheduling of stream programs.
Proceedings of the 2003 Conference on Languages, 2003

Adapting Convergent Scheduling Using Machine-Learning.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architecture.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Genetic Programming Applied to Compiler Heuristic Optimization.
Proceedings of the Genetic Programming, 6th European Conference, EuroGP 2003, 2003

An Infrastructure for Adaptive Dynamic Optimization.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002
A common machine language for grid-based architectures.
SIGARCH Computer Architecture News, 2002

The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs.
IEEE Micro, 2002

Secure Execution via Program Shepherding.
Proceedings of the 11th USENIX Security Symposium, 2002

Defying the speed of light: : a spatially-aware compiler for wire-exposed architectures.
Proceedings of the ACM SIGPLAN ASIA-PEPM 2002, 2002

Convergent scheduling.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Providing Web search capability for low-connectivity communities.
Proceedings of the 2002 International Symposium on Technology and Society, 2002

Efficient Pipelining of Nested Loops: Unroll-and-Squash.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

StreamIt: A Language for Streaming Applications.
Proceedings of the Compiler Construction, 11th International Conference, 2002

A stream compiler for communication-exposed architectures.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

Increasing and Detecting Memory Address Congruence.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001
Compiler Support for Scalable and Efficient Memory Systems.
IEEE Trans. Computers, 2001

A Unified Framework for Schedule and Storage Optimization.
Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2001

Strength Reduction of Integer Division and Modulo Operations.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

2000
Bitwidth analysis with application to silicon compilation.
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000

Exploiting superword level parallelism with multimedia instruction sets.
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000

FlexCache: A Framework for Flexible Compiler Generated Data Caching.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

1999
Maps: A Compiler-Managed Memory System for Raw Machines.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

Parallelizing Applications into Silicon.
Proceedings of the 7th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '99), 1999

1998
Maximizing Multiprocessor Performance with the SUIF Compiler.
Digital Technical Journal, 1998

Memory bank disambiguation using modulo unrolling for Raw machines.
Proceedings of the 5th International Conference On High Performance Computing, 1998

Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

1997
Baring It All to Software: Raw Machines.
IEEE Computer, 1997

1996
Multiprocessors from a software perspective.
IEEE Micro, 1996

Maximizing Multiprocessor Performance with the SUIF Compiler.
IEEE Computer, 1996

1995
Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Interprocedural Parallelization Analysis: A Case Study.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

An Overview of the SUIF Compiler for Scalable Parallel Machines.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

Data and Computation Transformations for Multiprocessors.
Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1995

Interprocedural Analysis for Parallelization.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

Unified Compilation Techniques for Shared and Distributed Address Space Machines.
Proceedings of the 9th international conference on Supercomputing, 1995

1994
SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers.
SIGPLAN Notices, 1994

1993
Array Data-Flow Analysis and its Use in Array Privatization.
Proceedings of the Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1993

Communication Optimization and Code Generation for Distributed Memory Machines.
Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation (PLDI), 1993

An Overview of a Compiler for Scalable Parallel Machines.
Proceedings of the Languages and Compilers for Parallel Computing, 1993


  Loading...