Michael Garland

Scott A. Mahlke

Nektarios Georgios Tsoutsos

Christos Kozyrakis

CoRR, August, 2025

On the Duality of Task and Actor Programming Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

Hardware-Accelerated Encrypted Execution of General-Purpose Applications.

[BibT_eX]

[DOI]

Proc. Priv. Enhancing Technol., 2025

Task-Based Tensor Computations on Modern GPUs.

[BibT_eX]

[DOI]

Proc. ACM Program. Lang., 2025

Composing Distributed Computations Through Task and Kernel Fusion.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

Automatic Tracing in Task-Based Runtime Systems.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024

Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., March, 2024

CUDASTF: Bridging the Gap Between CUDA and Task Parallelism.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

2023

Accelerated Encrypted Execution of General-Purpose Applications.

[BibT_eX]

[DOI]

Nektarios Georgios Tsoutsos

IACR Cryptol. ePrint Arch., 2023

CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs.

[BibT_eX]

[DOI]

CoRR, 2023

ArctyrEX : Accelerated Encrypted Execution of General-Purpose Applications.

[BibT_eX]

[DOI]

Nektarios Georgios Tsoutsos

CoRR, 2023

Understanding the Effect of the Long Tail on Neural Network Compression.

[BibT_eX]

[DOI]

Harvey Dam

Vinu Joseph

Aditya Bhaskara

Saurav Muralidharan

CoRR, 2023

Legate Sparse: Distributed Sparse Computing in Python.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

Stream-K: Work-Centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Visibility Algorithms for Dynamic Dependence Analysis and Distributed Coherence.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture.

[BibT_eX]

[DOI]

Zaid Qureshi

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Graphene: An IR for Optimized Tensor Computations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture.

[BibT_eX]

[DOI]

Zaid Qureshi

Dataset, October, 2022

Efficient Sparsely Activated Transformers.

[BibT_eX]

[DOI]

Salar Latifi

Saurav Muralidharan

CoRR, 2022

BaM: A Case for Enabling Fine-grain High Throughput GPU-Orchestrated Access to Storage.

[BibT_eX]

[DOI]

Zaid Qureshi

CoRR, 2022

2021

Supercomputing in Python With Legate.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2021

Scaling implicit parallelism via dynamic control replication.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

2020

A Programmable Approach to Neural Network Compression.

[BibT_eX]

[DOI]

Vinu Joseph

Saurav Muralidharan

Animesh Garg

IEEE Micro, 2020

Reliable Model Compression via Label-Preservation-Aware Loss Functions.

[BibT_eX]

[DOI]

Vinu Joseph

Shoaib Ahmed Siddiqui

Aditya Bhaskara

CoRR, 2020

2019

A Programmable Approach to Model Compression.

[BibT_eX]

[DOI]

CoRR, 2019

GPU-Accelerated Atari Emulation for Reinforcement Learning.

[BibT_eX]

[DOI]

Steven Dalton

Iuri Frosio

CoRR, 2019

Legate NumPy: accelerated and distributed array computing.

[BibT_eX]

[DOI]

Michael Bauer

Proceedings of the International Conference for High Performance Computing, 2019

Throughput-oriented GPU memory allocation.

[BibT_eX]

[DOI]

Isaac Gelado

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

2018

A Block-Oriented, Parallel and Collective Approach to Sparse Indefinite Preconditioning on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2018

Dynamic tracing: memoization of task graphs for dynamic task-based runtimes.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

2017

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks.

[BibT_eX]

[DOI]

Aditya Devarakonda

Maxim Naumov

CoRR, 2017

Parallel Depth-First Search for Directed Acyclic Graphs.

[BibT_eX]

[DOI]

Maxim Naumov

Alysson Vrielink

Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, 2017

2016

Designing a Tunable Nested Data-Parallel Programming System.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

Merge-based parallel sparse matrix-vector multiplication.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Architecture-Adaptive Code Variant Tuning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015

High-Performance and Scalable GPU Graph Traversal.

[BibT_eX]

[DOI]

Andrew S. Grimshaw

ACM Trans. Parallel Comput., 2015

A collection-oriented programming model for performance portability.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Optimizing Sparse Matrix Operations on GPUs Using Merge Path.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

A decomposition for in-place matrix transposition.

[BibT_eX]

[DOI]

Bryan Catanzaro

Alexander Keller

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Nitro: A Framework for Adaptive Code Variant Tuning.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Red Fox: An Execution Environment for Relational Query Processing on GPUs.

[BibT_eX]

[DOI]

Sudhakar Yalamanchili

Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013

Guest Editors' Introduction: Special Section on the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D 2012).

[BibT_eX]

[DOI]

Rui Wang

IEEE Trans. Vis. Comput. Graph., 2013

2012

Designing a unified programming model for heterogeneous machines.

[BibT_eX]

[DOI]

Manjunath Kudlur

Yili Zheng

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Scalable GPU graph traversal.

[BibT_eX]

[DOI]

Andrew S. Grimshaw

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Scalable Manycore Computing with CUDA.

[BibT_eX]

[DOI]

Vinod Grover

Kevin Skadron

Fundamentals of Multicore Software Development, 2012

2011

NVIDIA GPU.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

GPUs and the Future of Parallel Computing.

[BibT_eX]

[DOI]

IEEE Micro, 2011

Social Network Clustering and Visualization using Hierarchical Edge Bundles.

[BibT_eX]

[DOI]

Yuntao Jia

John C. Hart

Comput. Graph. Forum, 2011

Copperhead: compiling an embedded data parallel language.

[BibT_eX]

[DOI]

Bryan Catanzaro

Kurt Keutzer

Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

2010

Understanding throughput-oriented architectures.

[BibT_eX]

[DOI]

David Blair Kirk

Commun. ACM, 2010

Parallel computing with CUDA.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Sparse Matrix-Vector Multiplication on Multicore and Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

Efficient Parallel Scan Algorithms for Manycore GPUs.

[BibT_eX]

[DOI]

Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

2009

Solving Computational Problems with GPU Computing.

[BibT_eX]

[DOI]

Jonathan Cohen

Comput. Sci. Eng., 2009

Fast BVH Construction on GPUs.

[BibT_eX]

[DOI]

Comput. Graph. Forum, 2009

MLS-based scalar fields over triangle meshes and their application in mesh processing.

[BibT_eX]

[DOI]

Jingyi Jin

Edgar A. Ramos

Proceedings of the 2009 Symposium on Interactive 3D Graphics, 2009

Implementing sparse matrix-vector multiplication on throughput-oriented processors.

[BibT_eX]

[DOI]

Nathan Bell

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Designing efficient sorting algorithms for manycore GPUs.

[BibT_eX]

[DOI]

Nadathur Satish

Mark J. Harris

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008

On the Visualization of Social and other Scale-Free Networks.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., 2008

Free-form motion processing.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2008

Parallel Computing Experiences with CUDA.

[BibT_eX]

[DOI]

IEEE Micro, 2008

Scalable parallel programming with CUDA.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2008

Rapid Multipole Graph Drawing on the GPU.

[BibT_eX]

[DOI]

Proceedings of the Graph Drawing, 16th International Symposium, 2008

Sparse matrix computations on manycore GPU's.

[BibT_eX]

[DOI]

Proceedings of the 45th Design Automation Conference, 2008

2007

Iterative Methods for Improving Mesh Parameterizations.

[BibT_eX]

[DOI]

Shen Dong

Proceedings of the 2007 International Conference on Shape Modeling and Applications (SMI 2007), 2007

Sketching mesh deformations.

[BibT_eX]

[DOI]

Youngihn Kho

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2007

2006

Interactive Point-Based Rendering of Higher-Order Tetrahedral Data.

[BibT_eX]

[DOI]

Yuan Zhou

IEEE Trans. Vis. Comput. Graph., 2006

Visual Exploration of Complex Time-Varying Graphs.

[BibT_eX]

[DOI]

Gautam Kumar

IEEE Trans. Vis. Comput. Graph., 2006

Editing arbitrarily deforming surface animations.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2006

Spectral surface quadrangulation.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2006

Interactive Multiresolution Editing and Display of Large Terrains.

[BibT_eX]

[DOI]

Samuel Atlan

Comput. Graph. Forum, 2006

2005

A Multiresolution Representation for Massive Meshes.

[BibT_eX]

[DOI]

Eric Shaffer

IEEE Trans. Vis. Comput. Graph., 2005

Quadric-based simplification in any dimension.

[BibT_eX]

[DOI]

Yuan Zhou

ACM Trans. Graph., 2005

Harmonic functions for quadrilateral remeshing of arbitrary manifolds.

[BibT_eX]

[DOI]

Shen Dong

Comput. Aided Geom. Des., 2005

Curvature Maps for Local Shape Comparison.

[BibT_eX]

[DOI]

Proceedings of the 2005 International Conference on Shape Modeling and Applications (SMI 2005), 2005

Surfacing by numbers.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2005

Progressive multiresolution meshes for deforming surfaces.

[BibT_eX]

[DOI]

Proceedings of the 2005 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2005

Interactive material replacement in photographs.

[BibT_eX]

[DOI]

Proceedings of the Graphics Interface 2005 Conference, 2005

2004

Jump map-based interactive texture synthesis.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2004

Fair morse functions for extracting the topological structure of a surface mesh.

[BibT_eX]

[DOI]

Xinlai Ni

John C. Hart

ACM Trans. Graph., 2004

Pixel-Exact Rendering of Spacetime Finite Element Solutions.

[BibT_eX]

[DOI]

Yuan Zhou

Robert B. Haber

Proceedings of the 15th IEEE Visualization Conference, 2004

Similarity-based surface modelling using geodesic fans.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2004

Mesh Modelling with Curve Analogies.

[BibT_eX]

[DOI]

Proceedings of the 12th Pacific Conference on Computer Graphics and Applications, 2004

Mining scale-free networks using geodesic clustering.

[BibT_eX]

[DOI]

Andrew Y. Wu

Jiawei Han

Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004

Spacetime meshing with adaptive refinement and coarsening.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Symposium on Computational Geometry, 2004

2003

User-guided simplification.

[BibT_eX]

[DOI]

Youngihn Kho

Proceedings of the 2003 Symposium on Interactive 3D Graphics, 2003

Interactive Texture Synthesis on Surfaces using Jump Maps.

[BibT_eX]

[DOI]

Proceedings of the 14th Eurographics Workshop on Rendering Techniques, 2003

2002

Permission grids: practical, error-bounded simplification.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2002

A Multiphase Approach to Efficient Surface Simplification.

[BibT_eX]

[DOI]

Eric Shaffer

Proceedings of the 13th IEEE Visualization Conference, 2002

Towards Real-Time Texture Synthesis with the Jump Map.

[BibT_eX]

[DOI]

Proceedings of the 13th Eurographics Workshop on Rendering Techniques, 2002

2001

Efficient Adaptive Simplification of Massive Meshes.

[BibT_eX]

[DOI]

Eric Shaffer

Proceedings of the 12th IEEE Visualization Conference, 2001

Hierarchical face clustering on polygonal surfaces.

[BibT_eX]

[DOI]

Andrew J. Willmott

Proceedings of the 2001 Symposium on Interactive 3D Graphics, 2001

1999

Quadric-Based Polygonal Surface Simplification.

[BibT_eX]

[DOI]

PhD thesis, 1999

Optimal triangulation and quadric-based surface simplification.

[BibT_eX]

[DOI]

Comput. Geom., 1999

Face Cluster Radiosity.

[BibT_eX]

[DOI]

Andrew J. Willmott

Proceedings of the Rendering Techniques '99, 1999

Multiresolution Modeling: Survey and Future Opportunities.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the European Association for Computer Graphics, 1999

1998

Simplifying surfaces with color and texture using quadric error metrics.

[BibT_eX]

[DOI]

Proceedings of the 9th IEEE Visualization Conference, 1998

1997

Surface simplification using quadric error metrics.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, 1997

1996

Fast and flexible polygonization of height fields.

[BibT_eX]

[DOI]