Vinod Grover

Somashekaracharya G. Bhaskaracharya

Proceedings of the 35th ACM SIGPLAN International Conference on Compiler Construction, 2026

Nsight Python: A Python-First Profiling Toolkit for Seamless GPU Kernel Analysis (Tool).

[BibT_eX]

[DOI]

Proceedings of the 35th ACM SIGPLAN International Conference on Compiler Construction, 2026

2025

Modeling Layout Abstractions Using Integer Set Relations.

[BibT_eX]

[DOI]

Aravind Acharya

CoRR, November, 2025

A Performance Model for Warp Specialization Kernels.

[BibT_eX]

[DOI]

Zhengyang Liu

CoRR, June, 2025

Scaling Deep Learning Training with MPMD Pipeline Parallelism.

[BibT_eX]

[DOI]

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving.

[BibT_eX]

[DOI]

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

Pattern Matching in AI Compilers and Its Formalization.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization, 2025

2024

Pattern Matching in AI Compilers and its Formalization (Extended Version).

[BibT_eX]

[DOI]

CoRR, 2024

2023

Graphene: An IR for Optimized Tensor Computations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

Axon: A Language for Dynamic Shapes in Deep Learning Graphs.

[BibT_eX]

[DOI]

Alexander Collins

CoRR, 2022

2020

Probabilistic Programming with CuPPL.

[BibT_eX]

[DOI]

Alexander Collins

Somashekaracharya G. Bhaskaracharya

CoRR, 2020

Automatic Kernel Generation for Volta Tensor Cores.

[BibT_eX]

[DOI]

Julien Demouth

CoRR, 2020

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs.

[BibT_eX]

[DOI]

Archibald Samuel Elliott

Henrik Barthels

Rastislav Bodík

CoRR, 2020

Fireiron: A Data-Movement-Aware Scheduling Language for GPUs.

[BibT_eX]

[DOI]

Archibald Samuel Elliott

Henrik Barthels

Rastislav Bodík

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Automatic acceleration of Numpy applications on GPUs and multicore CPUs.

[BibT_eX]

[DOI]

Phitchaya Mangpo Phothilimthana

CoRR, 2019

Swizzle Inventor: Data Movement Synthesis for GPU Kernels.

[BibT_eX]

[DOI]

Archibald Samuel Elliott

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations.

[BibT_eX]

[DOI]

Prashant Singh Rawat

Miheer Vaidya

Aravind Sukumaran-Rajam

Proc. IEEE, 2018

CURD: a dynamic CUDA race detector.

[BibT_eX]

[DOI]

Yuanfeng Peng

Joseph Devietti

Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Diesel: DSL for linear algebra and neural net computations on GPUs.

[BibT_eX]

[DOI]

Venmugil Elango

Norm Rubin

Hariharan Sandanagobalane

Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018

2016

Effective resource management for enhancing performance of 2D and 3D stencils on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2016

Resource Conscious Reuse-Driven Tiling for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Forma: a DSL for image processing applications to target GPUs and multi-core CPUs.

[BibT_eX]

[DOI]

Justin Holewinski

Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Fusing convolution kernels through tiling.

[BibT_eX]

[DOI]

Paulius Micikevicius

Proceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, 2015

Type-safe runtime code generation: accelerate to LLVM.

[BibT_eX]

[DOI]

Trevor L. McDonell

Manuel M. T. Chakravarty

Ryan R. Newton

Proceedings of the 8th ACM SIGPLAN Symposium on Haskell, 2015

2014

NOVA: A Functional Language for Data Parallelism.

[BibT_eX]

[DOI]

Proceedings of the ARRAY'14: Proceedings of the 2014 ACM SIGPLAN International Workshop on Libraries, 2014

Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

LambdaJIT: a dynamic compiler for heterogeneous optimizations of STL algorithms.

[BibT_eX]

[DOI]

Thibaut Lutz

Proceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing, 2014

2013

Separate Compilation in a Language-Integrated Heterogeneous Environment.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2013

Towards shared memory consistency models for GPUs.

[BibT_eX]

[DOI]

Tyler Sorensen

Ganesh Gopalakrishnan

Proceedings of the International Conference on Supercomputing, 2013

Convergence and scalarization for data-parallel architectures.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012

CUDA: Compiling and optimizing for a GPU platform.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2012

JaBEE: framework for object-oriented Java bytecode compilation and execution on graphics processor units.

[BibT_eX]

[DOI]

Wojciech Zaremba

Yuan Lin

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012

Scalable Manycore Computing with CUDA.

[BibT_eX]

[DOI]

Michael Garland

Kevin Skadron

Fundamentals of Multicore Software Development, 2012

2011

Accelerating Haskell array codes with multicore GPUs.

[BibT_eX]

[DOI]

Manuel M. T. Chakravarty

Proceedings of the POPL 2011 Workshop on Declarative Aspects of Multicore Programming, 2011

2010

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs.

[BibT_eX]

[DOI]

Proceedings of the CGO 2010, 2010

2008

Samurai: protecting critical data in unsafe languages.

[BibT_eX]

[DOI]

Karthik Pattabiraman