Vinod Grover

Orcid: 0000-0003-0115-3896

According to our database1, Vinod Grover authored at least 41 papers between 2008 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel.
CoRR, April, 2026

AVO: Agentic Variation Operators for Autonomous Evolutionary Search.
CoRR, March, 2026

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits.
CoRR, March, 2026

VibeTensor: System Software for Deep Learning, Fully Generated by AI Agents.
CoRR, January, 2026

Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026

It's about Time: Temporal Abstractions for Asynchronous GPU Tensor Computations.
Proceedings of the 35th ACM SIGPLAN International Conference on Compiler Construction, 2026

Nsight Python: A Python-First Profiling Toolkit for Seamless GPU Kernel Analysis (Tool).
Proceedings of the 35th ACM SIGPLAN International Conference on Compiler Construction, 2026

2025
Modeling Layout Abstractions Using Integer Set Relations.
CoRR, November, 2025

A Performance Model for Warp Specialization Kernels.
CoRR, June, 2025

Scaling Deep Learning Training with MPMD Pipeline Parallelism.
Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving.
Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

Pattern Matching in AI Compilers and Its Formalization.
Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization, 2025

2024
Pattern Matching in AI Compilers and its Formalization (Extended Version).
CoRR, 2024

2023
Graphene: An IR for Optimized Tensor Computations on GPUs.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Axon: A Language for Dynamic Shapes in Deep Learning Graphs.
CoRR, 2022

2020
Probabilistic Programming with CuPPL.
CoRR, 2020

Automatic Kernel Generation for Volta Tensor Cores.
CoRR, 2020

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs.
CoRR, 2020

Fireiron: A Data-Movement-Aware Scheduling Language for GPUs.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Automatic acceleration of Numpy applications on GPUs and multicore CPUs.
CoRR, 2019

Swizzle Inventor: Data Movement Synthesis for GPU Kernels.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations.
Proc. IEEE, 2018

CURD: a dynamic CUDA race detector.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Diesel: DSL for linear algebra and neural net computations on GPUs.
Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018

2016
Effective resource management for enhancing performance of 2D and 3D stencils on GPUs.
Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2016

Resource Conscious Reuse-Driven Tiling for GPUs.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Forma: a DSL for image processing applications to target GPUs and multi-core CPUs.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Fusing convolution kernels through tiling.
Proceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, 2015

Type-safe runtime code generation: accelerate to LLVM.
Proceedings of the 8th ACM SIGPLAN Symposium on Haskell, 2015

2014
NOVA: A Functional Language for Data Parallelism.
Proceedings of the ARRAY'14: Proceedings of the 2014 ACM SIGPLAN International Workshop on Libraries, 2014

Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

LambdaJIT: a dynamic compiler for heterogeneous optimizations of STL algorithms.
Proceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing, 2014

2013
Separate Compilation in a Language-Integrated Heterogeneous Environment.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

Towards shared memory consistency models for GPUs.
Proceedings of the International Conference on Supercomputing, 2013

Convergence and scalarization for data-parallel architectures.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012
CUDA: Compiling and optimizing for a GPU platform.
Proceedings of the International Conference on Computational Science, 2012

JaBEE: framework for object-oriented Java bytecode compilation and execution on graphics processor units.
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012

Scalable Manycore Computing with CUDA.
Fundamentals of Multicore Software Development, 2012

2011
Accelerating Haskell array codes with multicore GPUs.
Proceedings of the POPL 2011 Workshop on Declarative Aspects of Multicore Programming, 2011

2010
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs.
Proceedings of the CGO 2010, 2010

2008
Samurai: protecting critical data in unsafe languages.
Proceedings of the 2008 EuroSys Conference, Glasgow, Scotland, UK, April 1-4, 2008, 2008


  Loading...