We stand with Ukraine

We stand with Ukraine

John D. Owens

Orcid: 0000-0001-6582-8237

Affiliations:

University of California, Davis, US

According to our database¹, John D. Owens authored at least 160 papers between 1998 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2026

Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models.

[DOI]

Toluwanimi O. Odemuyiwa

,

,

,

Michael Pellauer

CoRR, April, 2026

Fast Sparse Matrix Permutation for Mesh-Based Direct Solvers.

[DOI]

Behrooz Zarebavami

,

Ahmed H. Mahmoud

,

,

Changcheng Yuan

,

Serban D. Porumbescu

,

,

Maryam Mehri Dehnavi

,

CoRR, February, 2026

2025

BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding.

[DOI]

,

,

,

,

George Klimiashvili

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, December, 2025

MLPerf Automotive.

[DOI]

Radoyeh Shojaei

,

Predrag Djurdjevic

,

Mostafa El-Khamy

,

,

Kasper Mecklenburg

,

,

Pinar Muyan-Özçelik

,

,

,

CoRR, October, 2025

Dynamic Mesh Processing on the GPU.

[DOI]

Ahmed H. Mahmoud

,

Serban D. Porumbescu

,

ACM Trans. Graph., August, 2025

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms.

[DOI]

,

,

Pallab Bhattacharya

,

,

,

IEEE Trans. Parallel Distributed Syst., February, 2025

Decoupled Fallback: A Portable Single-Pass GPU Scan.

[DOI]

,

,

Proceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures, 2025

Dynamic Mesh Processing on the GPU (Abstract).

[DOI]

Ahmed H. Mahmoud

,

Serban D. Porumbescu

,

Proceedings of the 3rd Highlights of Parallel Computing Workshop, 2025

2024

The EDGE Language: Extended General Einsums for Graph Algorithms.

[DOI]

Toluwanimi O. Odemuyiwa

,

,

CoRR, 2024

Accelerating Multi-GPU Embedding Retrieval with PGAS-Style Communication for Deep Learning Recommendation Systems.

[DOI]

,

,

Katherine A. Yelick

,

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Helping Faculty Teach Software Performance Engineering.

[DOI]

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023

The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks.

[DOI]

,

Collin McCarthy

,

Saurav Muralidharan

,

,

CoRR, 2023

BOBA: A Parallel Lightweight Graph Reordering Algorithm with Heavyweight Implications.

[DOI]

Matthew Drescher

,

Muhammad A. Awad

,

Serban D. Porumbescu

,

CoRR, 2023

Harmonic CUDA: Asynchronous Programming on GPUs.

[DOI]

Jonathan D. Wapman

,

,

Serban D. Porumbescu

,

Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores, 2023

A Programming Model for GPU Load Balancing.

[DOI]

,

Serban D. Porumbescu

,

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Stream-K: Work-Centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU.

[DOI]

,

,

,

Michael Garland

,

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Maximum Clique Enumeration on the GPU.

[DOI]

,

Serban D. Porumbescu

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling (Extended Abstract).

[DOI]

Toluwanimi O. Odemuyiwa

,

Hadi Asghari Moghaddam

,

Michael Pellauer

,

,

,

Neal Clayton Crago

,

,

,

Edgar Solomonik

,

,

Christopher W. Fletcher

Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling.

[DOI]

Toluwanimi O. Odemuyiwa

,

Hadi Asghari Moghaddam

,

Michael Pellauer

,

,

,

Neal Clayton Crago

,

,

,

Edgar Solomonik

,

,

Christopher W. Fletcher

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Analyzing and Implementing GPU Hash Tables.

[DOI]

Muhammad A. Awad

,

,

Serban D. Porumbescu

,

Martin Farach-Colton

,

Proceedings of the 2023 Symposium on Algorithmic Principles of Computer Systems, 2023

2022

Loops: A Programming Model for GPU Load Balancing.

[DOI]

,

Serban D. Porumbescu

,

Dataset, December, 2022

A Programming Model for GPU Load Balancing.

[DOI]

,

Serban D. Porumbescu

,

Dataset, December, 2022

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU.

[DOI]

,

,

ACM Trans. Math. Softw., 2022

Supporting Unified Shader Specialization by Co-opting C++ Features.

[DOI]

,

,

Serban D. Porumbescu

,

Proc. ACM Comput. Graph. Interact. Tech., 2022

Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way.

[DOI]

,

,

Serban D. Porumbescu

,

,

Katherine A. Yelick

,

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Essentials of Parallel Graph Analytics.

[DOI]

,

Serban D. Porumbescu

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Atos: A Task-Parallel GPU Scheduler for Graph Analytics.

[DOI]

,

,

Serban D. Porumbescu

,

,

Katherine A. Yelick

,

Proceedings of the 51st International Conference on Parallel Processing, 2022

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs.

[DOI]

,

,

Ehsan K. Ardestani

,

,

,

,

,

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

A GPU Multiversion B-Tree.

[DOI]

Muhammad A. Awad

,

Serban D. Porumbescu

,

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

Introduction to GraphBLAS.

[DOI]

,

,

,

,

Franz Franchetti

,

,

Shana Hutchison

,

,

Andrew Lumsdaine

,

Henning Meyerhenke

,

,

José E. Moreira

,

,

,

Marcin Zalewski

,

Timothy G. Mattson

Proceedings of the Massive Graph Analytics, 2022

2021

RXMesh: a GPU mesh data structure.

[DOI]

Ahmed H. Mahmoud

,

Serban D. Porumbescu

,

ACM Trans. Graph., 2021

Atos: A Task-Parallel GPU Dynamic Scheduling Framework for Dynamic Irregular Computations.

[DOI]

,

,

Serban D. Porumbescu

,

,

Katherine A. Yelick

,

CoRR, 2021

Unified Shader Programming in C++.

[DOI]

Kerry A. Seitz Jr.

,

,

Serban D. Porumbescu

,

CoRR, 2021

Better GPU Hash Tables.

[DOI]

Muhammad A. Awad

,

,

Serban D. Porumbescu

,

Martin Farach-Colton

,

CoRR, 2021

Towards Flexible and Compiler-Friendly Layer Fusion for CNNs on Multicore CPUs.

[DOI]

,

Evangelos Georganas

,

Proceedings of the Euro-Par 2021: Parallel Processing, 2021

2020

VoroCrust: Voronoi Meshing Without Clipping.

[DOI]

Ahmed Abdelkader

,

Chandrajit L. Bajaj

,

Mohamed S. Ebeida

,

Ahmed H. Mahmoud

,

Scott A. Mitchell

,

,

Ahmad A. Rushdi

ACM Trans. Graph., 2020

Fast Gunrock Subgraph Matching (GSM) on GPUs.

[DOI]

,

CoRR, 2020

Energy-based Out-of-distribution Detection.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Dynamic Graphs on the GPU.

[DOI]

Muhammad A. Awad

,

,

Serban D. Porumbescu

,

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019

Staged metaprogramming for shader system development.

[DOI]

Kerry A. Seitz Jr.

,

,

Serban D. Porumbescu

,

ACM Trans. Graph., 2019

Benchmarking Deep Learning Frameworks and Investigating FPGA Deployment for Traffic Sign Classification and Detection.

[DOI]

,

,

,

,

Pinar Muyan-Özçelik

IEEE Trans. Intell. Veh., 2019

Unsupervised Object Segmentation with Explicit Localization Module.

[DOI]

,

,

James Sharpnack

,

CoRR, 2019

RDMA vs. RPC for Implementing Distributed Data Structures.

[DOI]

Benjamin A. Brock

,

,

,

,

,

Katherine A. Yelick

Proceedings of the 9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2019

Engineering a high-performance GPU B-Tree.

[DOI]

Muhammad A. Awad

,

,

,

Martin Farach-Colton

,

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Graph Coloring on the GPU.

[DOI]

,

,

,

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Fast BFS-Based Triangle Counting on GPUs.

[DOI]

,

Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

Accelerating DNN Inference with GraphBLAS and the GPU.

[DOI]

,

,

,

Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

2018

Object Localization and Motion Transfer learning with Capsules.

[DOI]

,

,

CoRR, 2018

Technical perspective: Graphs, betweenness centrality, and the GPU.

[DOI]

Commun. ACM, 2018

Benchmarking Deep Learning Frameworks with FPGA-suitable Models on a Traffic Sign Dataset.

[DOI]

,

,

,

Pinar Muyan-Özçelik

Proceedings of the 2018 IEEE Intelligent Vehicles Symposium, 2018

FPGA versus GPU for Speed-Limit-Sign Recognition.

[DOI]

,

,

,

Pinar Muyan-Özçelik

Proceedings of the 21st International Conference on Intelligent Transportation Systems, 2018

Scalable Breadth-First Search on a GPU Cluster.

[DOI]

,

,

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Quotient Filters: Approximate Membership Queries on the GPU.

[DOI]

,

Martin Farach-Colton

,

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

GPU LSM: A Dynamic Dictionary Data Structure for the GPU.

[DOI]

,

,

Martin Farach-Colton

,

,

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

A Dynamic Hash Table for the GPU.

[DOI]

,

Martin Farach-Colton

,

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Implementing Push-Pull Efficiently in GraphBLAS.

[DOI]

,

,

Proceedings of the 47th International Conference on Parallel Processing, 2018

Design Principles for Sparse Matrix Multiplication on the GPU.

[DOI]

,

,

Proceedings of the Euro-Par 2018: Parallel Processing, 2018

VoroCrust Illustrated: Theory and Challenges (Multimedia Exposition).

[DOI]

Ahmed Abdelkader

,

Chandrajit L. Bajaj

,

Mohamed S. Ebeida

,

Ahmed H. Mahmoud

,

Scott A. Mitchell

,

,

Ahmad A. Rushdi

Proceedings of the 34th International Symposium on Computational Geometry, 2018

Sampling Conditions for Conforming Voronoi Meshing by the VoroCrust Algorithm.

[DOI]

Ahmed Abdelkader

,

Chandrajit L. Bajaj

,

Mohamed S. Ebeida

,

Ahmed H. Mahmoud

,

Scott A. Mitchell

,

,

Ahmad A. Rushdi

Proceedings of the 34th International Symposium on Computational Geometry, 2018

2017

Gunrock: GPU Graph Analytics.

[DOI]

,

,

Andrew A. Davidson

,

,

,

,

,

,

,

,

ACM Trans. Parallel Comput., 2017

GPU Multisplit: An Extended Study of a Parallel Algorithm.

[DOI]

,

Andrew A. Davidson

,

,

ACM Trans. Parallel Comput., 2017

Methods for multitasking among real-time embedded compute tasks running on the GPU.

[DOI]

Pinar Muyan-Özçelik

,

Concurr. Comput. Pract. Exp., 2017

A Constrained Resampling Strategy for Mesh Improvement.

[DOI]

Ahmed Abdelkader

,

Ahmed H. Mahmoud

,

Ahmad A. Rushdi

,

Scott A. Mitchell

,

,

Mohamed S. Ebeida

Comput. Graph. Forum, 2017

Mini-Gunrock: A Lightweight Graph Analytics Framework on the GPU.

[DOI]

,

,

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Multi-GPU Graph Analytics.

[DOI]

,

,

,

,

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

2016

Multidisciplinary simulation acceleration using multiple shared memory graphical processing units.

[DOI]

Jonathan Y. Kemal

,

,

Int. J. High Perform. Comput. Appl., 2016

Fast parallel skew and prefix-doubling suffix array construction on the GPU.

[DOI]

,

,

Concurr. Comput. Pract. Exp., 2016

Disk Density Tuning of a Maximal Random Packing.

[DOI]

Mohamed S. Ebeida

,

Ahmad A. Rushdi

,

Muhammad A. Awad

,

Ahmed H. Mahmoud

,

,

Shawn A. English

,

,

Chandrajit L. Bajaj

,

Scott A. Mitchell

Comput. Graph. Forum, 2016

Parallel Approaches to the String Matching Problem on the GPU.

[DOI]

,

,

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016

Multitasking Real-time Embedded GPU Computing Tasks.

[DOI]

Pinar Muyan-Özçelik

,

Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016

GPU multisplit.

[DOI]

,

Andrew A. Davidson

,

,

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Mathematical foundations of the GraphBLAS.

[DOI]

,

,

,

,

Franz Franchetti

,

John R. Gilbert

,

Dylan Hutchison

,

,

Andrew Lumsdaine

,

Henning Meyerhenke

,

,

,

,

Marcin Zalewski

,

Timothy G. Mattson

,

José E. Moreira

Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

A Comparative Study on Exact Triangle Counting Algorithms on the GPU.

[DOI]

,

,

,

Proceedings of the ACM Workshop on High Performance Graph Processing, 2016

Real-time GPU-based timing channel detection using entropy.

[DOI]

,

,

,

Proceedings of the 2016 IEEE Conference on Communications and Network Security, 2016

2015

Piko: a framework for authoring programmable graphics pipelines.

[DOI]

,

,

Kerry A. Seitz Jr.

,

ACM Trans. Graph., 2015

Parallel Reyes-style adaptive subdivision with bounded memory usage.

[DOI]

,

,

Proceedings of the 19th Symposium on Interactive 3D Graphics and Games, San Francisco, CA, USA, February 27, 2015

Gunrock: a high-performance graph processing library on the GPU.

[DOI]

,

Andrew A. Davidson

,

,

,

,

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU.

[DOI]

,

,

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Performance Characterization of High-Level Programming Models for GPU Graph Analytics.

[DOI]

,

,

,

,

Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

Fast Parallel Suffix Array on the GPU.

[DOI]

,

,

Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Efficient dense reconstruction using geometry and image consistency constraints.

[DOI]

Mikhail M. Shashkov

,

,

,

Connie S. Nguyen

,

,

Proceedings of the 2015 IEEE Applied Imagery Pattern Recognition Workshop, 2015

Exercises in High-Dimensional Sampling: Maximal Poisson-Disk Sampling and <i>k</i>-d Darts.

[DOI]

Mohamed S. Ebeida

,

Scott A. Mitchell

,

,

Andrew A. Davidson

,

,

Muhammad A. Awad

,

Ahmed H. Mahmoud

,

Proceedings of the Green in Software Engineering, 2015

2014

<i>k</i>-d Darts: Sampling by <i>k</i>-dimensional flat searches.

[DOI]

Mohamed S. Ebeida

,

,

Scott A. Mitchell

,

Keith R. Dalbey

,

Andrew A. Davidson

,

ACM Trans. Graph., 2014

Piko: A Design Framework for Programmable Graphics Pipelines.

[DOI]

,

,

Kerry A. Seitz Jr.

,

CoRR, 2014

GPU-accelerated and efficient multi-view triangulation for scene reconstruction.

[DOI]

,

Mauricio Hess-Flores

,

,

,

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2014

Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths.

[DOI]

Andrew A. Davidson

,

,

Michael Garland

,

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

WTF, GPU! computing twitter's who-to-follow on the GPU.

[DOI]

,

,

Proceedings of the second ACM conference on Online social networks, 2014

A Comparative Study of GPU-Accelerated Multi-view Sequential Reconstruction Triangulation Methods for Large-Scale Scenes.

[DOI]

,

Mauricio Hess-Flores

,

,

,

Proceedings of the Computer Vision - ACCV 2014 Workshops, 2014

2013

k-d Darts: Sampling by k-Dimensional Flat Searches

[DOI]

Mohamed S. Ebeida

,

,

Scott A. Mitchell

,

Keith R. Dalbey

,

Andrew A. Davidson

,

CoRR, 2013

A GPU Implementation for Two-Dimensional Shallow Water Modeling.

[DOI]

Kerry A. Seitz Jr.

,

,

,

Bassam A. Younis

,

CoRR, 2013

Sifted Disks.

[DOI]

Mohamed S. Ebeida

,

Ahmed H. Mahmoud

,

Muhammad A. Awad

,

Mohammed A. Mohammed

,

Scott A. Mitchell

,

,

Comput. Graph. Forum, 2013

2012

Finding Convex Hulls Using Quickhull on the GPU

[DOI]

,

CoRR, 2012

A GPU Task-Parallel Model with Dependency Resolution.

[DOI]

,

,

Computer, 2012

A Simple Algorithm for Maximal Poisson-Disk Sampling in High Dimensions.

[DOI]

Mohamed S. Ebeida

,

Scott A. Mitchell

,

,

Andrew A. Davidson

,

Comput. Graph. Forum, 2012

Plane-dependent error diffusion on a GPU.

[DOI]

,

,

Robert Ulichney

,

,

Proceedings of the Image Processing: Algorithms and Systems X; and Parallel Processing for Imaging Applications II, 2012

High-Quality Parallel Depth-of-Field Using Line Samples.

[DOI]

,

,

Andrew A. Davidson

,

Mohamed S. Ebeida

,

Scott A. Mitchell

,

Proceedings of the EUROGRAPHICS Conference on High Performance Graphics 2012, 2012

kANN on the GPU with Shifted Sorting.

[DOI]

,

,

Jagadeesh Bhaskar Pakaravoor

,

Fatemeh Abbasinejad

,

,

Proceedings of the EUROGRAPHICS Conference on High Performance Graphics 2012, 2012

2011

Efficient maximal poisson-disk sampling.

[DOI]

Mohamed S. Ebeida

,

Andrew A. Davidson

,

,

Patrick M. Knupp

,

Scott A. Mitchell

,

ACM Trans. Graph., 2011

Acceleration of 2-D Compressible Flow Solvers with Graphics Processing Unit Clusters.

[DOI]

Everett H. Phillips

,

,

,

J. Aerosp. Comput. Inf. Commun., 2011

Efficient Synchronization Primitives for GPUs

[DOI]

,

CoRR, 2011

Efficient and good Delaunay meshes from random points.

[DOI]

Mohamed S. Ebeida

,

Scott A. Mitchell

,

Andrew A. Davidson

,

,

Patrick M. Knupp

,

Comput. Aided Des., 2011

Efficient adaptive tiling for programmable rendering.

[DOI]

,

,

Proceedings of the Symposium on Interactive 3D Graphics and Games, 2011

A parallel error diffusion implementation on a GPU.

[DOI]

,

,

Robert Ulichney

,

Giordano B. Beretta

,

,

,

Proceedings of the Conference on Parallel Processing for Imaging Applications 2011, 2011

Feature-based speed limit sign detection using a graphics processing unit.

[DOI]

Vladimir Glavtchev

,

Pinar Muyan-Özçelik

,

,

Proceedings of the IEEE Intelligent Vehicles Symposium (IV), 2011

Multi-GPU MapReduce on GPU Clusters.

[DOI]

,

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU.

[DOI]

Andrew A. Davidson

,

,

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

A quantitative performance analysis model for GPU architectures.

[DOI]

,

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Compute & memory optimizations for high-quality speech recognition on low-end GPU processors.

[DOI]

,

Proceedings of the 18th International Conference on High Performance Computing, 2011

Lessons Learned from Exploring the Backtracking Paradigm on the GPU.

[DOI]

,

,

,

Alok N. Choudhary

,

Nagiza F. Samatova

Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Register packing for cyclic reduction: a case study.

[DOI]

Andrew A. Davidson

,

Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, 2011

2010

Fragment-Parallel Composite and Filter.

[DOI]

,

,

Comput. Graph. Forum, 2010

Fast tridiagonal solvers on the GPU.

[DOI]

,

,

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Toward Techniques for Auto-tuning GPU Algorithms.

[DOI]

Andrew A. Davidson

,

Proceedings of the Applied Parallel and Scientific Computing, 2010

Multi-GPU volume rendering using MapReduce.

[DOI]

,

,

,

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

GPU-to-CPU Callbacks.

[DOI]

,

,

Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

Task management for irregular-parallel workloads on the GPU.

[DOI]

,

,

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on High Performance Graphics 2010, 2010

A Template-Based Approach for Real-Time Speed-Limit-Sign Recognition on an Embedded System Using GPU Computing.

[DOI]

Pinar Muyan-Özçelik

,

Vladimir Glavtchev

,

,

Proceedings of the Pattern Recognition, 2010

Efficient Parallel Scan Algorithms for Manycore GPUs.

[DOI]

Shubhabrata Sengupta

,

,

Michael Garland

,

Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

2009

Real-time parallel hashing on the GPU.

[DOI]

Dan A. Alcantara

,

,

Fatemeh Abbasinejad

,

Shubhabrata Sengupta

,

Michael Mitzenmacher

,

,

ACM Trans. Graph., 2009

Out-of-core Data Management for Path Tracing on Hybrid Resources.

[DOI]

,

,

,

Shubhabrata Sengupta

,

,

Comput. Graph. Forum, 2009

Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures.

[DOI]

,

,

,

,

Proceedings of the Scientific and Statistical Database Management, 2009

Message passing on data-parallel architectures.

[DOI]

,

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

HCW 2009 keynote talk: GPU computing: Heterogeneous computing for future systems.

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Parallel view-dependent tessellation of Catmull-Clark subdivision surfaces.

[DOI]

,

Mohamed S. Ebeida

,

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on High Performance Graphics 2009, 2009

Three-layer optimizations for fast GMM computations on GPU-like parallel processors.

[DOI]

,

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

Real-time Reyes-style adaptive surface subdivision.

[DOI]

,

ACM Trans. Graph., 2008

GPU Computing.

[DOI]

,

,

,

,

,

James C. Phillips

Proc. IEEE, 2008

Parallel programming models overview.

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2008

Beyond programmable shading: fundamentals.

[DOI]

Aaron E. Lefohn

,

,

,

Kayvon Fatahalian

,

,

,

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2008

Efficient computation of sum-products on GPUs through software-managed cache.

[DOI]

Mark Silberstein

,

,

,

,

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Fast Deformable Registration on the GPU: A CUDA Implementation of Demons.

[DOI]

Pinar Muyan-Özçelik

,

,

,

Sanjiv S. Samant

Proceedings of the Selected Papers of the Sixth International Conference on Computational Sciences and Its Applications, 2008

2007

Resolution-matched shadow maps.

[DOI]

Aaron E. Lefohn

,

Shubhabrata Sengupta

,

ACM Trans. Graph., 2007

Research Challenges for On-Chip Interconnection Networks.

[DOI]

,

William J. Dally

,

,

Doddaballapur Narasimha-Murthy Jayasimha

,

Stephen W. Keckler

,

IEEE Micro, 2007

Data-parallel algorithms and data structures.

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2007

GPU architecture overview.

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2007

Scan primitives for GPU computing.

[DOI]

Shubhabrata Sengupta

,

,

,

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware 2007, 2007

2006

Discrete Sibson Interpolation.

[DOI]

,

,

,

,

IEEE Trans. Vis. Comput. Graph., 2006

Glift: Generic, efficient, random-access GPU data structures.

[DOI]

Aaron E. Lefohn

,

Shubhabrata Sengupta

,

,

Robert Strzodka

,

ACM Trans. Graph., 2006

S07 - GPGPU: general-purpose computation on graphics hardware.

[DOI]

David P. Luebke

,

,

Naga K. Govindaraju

,

Aaron E. Lefohn

,

,

,

,

Matthew Papakipos

,

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Distributed Texture Memory in a Multi-GPU Environment.

[DOI]

Adam Moerschell

,

Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 2006

The Virtual Pheromone Communication Primitive.

[DOI]

,

Proceedings of the Distributed Computing in Sensor Systems, 2006

2005

General Purpose Computation on Graphics Hardware.

Aaron E. Lefohn

,

,

Patrick S. McCormick

,

,

Timothy J. Purcell

,

Robert Strzodka

Proceedings of the 16th IEEE Visualization Conference, 2005

Streaming architectures and technology trends.

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2005

Dynamic adaptive shadow maps on graphics hardware.

[DOI]

Aaron E. Lefohn

,

Shubhabrata Sengupta

,

,

Robert Strzodka

,

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2005

Octree textures on graphics hardware.

[DOI]

,

Aaron E. Lefohn

,

Robert Strzodka

,

Shubhabrata Sengupta

,

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2005

A Survey of General-Purpose Computation on Graphics Hardware.

[DOI]

,

,

Naga K. Govindaraju

,

,

Jens H. Krüger

,

Aaron E. Lefohn

,

Timothy J. Purcell

Proceedings of the 26th Annual Conference of the European Association for Computer Graphics, 2005

2004

Mio: fast multipass partitioning via priority-based instruction scheduling.

[DOI]

,

Aaron E. Lefohn

,

,

,

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware 2004, 2004

2003

Programmable Stream Processors.

[DOI]

Ujval J. Kapasi

,

,

William J. Dally

,

Brucek Khailany

,

,

Peter R. Mattson

,

Computer, 2003

Exploring the VLSI Scalability of Stream Processors.

[DOI]

Brucek Khailany

,

William J. Dally

,

,

Ujval J. Kapasi

,

,

Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

2002

A Stream Processor Development Platform.

[DOI]

,

,

,

Stephen P. Crago

,

Ujval J. Kapasi

,

Peter R. Mattson

,

Jinyung Namkoong

,

,

William J. Dally

Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Media Processing Applications on the Imagine Stream Processor.

[DOI]

,

,

Ujval J. Kapasi

,

Peter R. Mattson

,

,

,

William J. Dally

Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

The Imagine Stream Processor.

[DOI]

Ujval J. Kapasi

,

William J. Dally

,

,

,

Brucek Khailany

Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Comparing Reyes and OpenGL on a Stream Architecture.

[DOI]

,

Brucek Khailany

,

,

William J. Dally

Proceedings of the 2002 ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2002

2001

Imagine: Media Processing with Streams.

[DOI]

Brucek Khailany

,

William J. Dally

,

Ujval J. Kapasi

,

Peter R. Mattson

,

Jinyung Namkoong

,

,

,

,

IEEE Micro, 2001

2000

Efficient conditional operations for data-parallel architectures.

[DOI]

Ujval J. Kapasi

,

William J. Dally

,

,

Peter R. Mattson

,

,

Brucek Khailany

Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Memory access scheduling.

[DOI]

,

William J. Dally

,

Ujval J. Kapasi

,

Peter R. Mattson

,

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Register Organization for Media Processing.

[DOI]

,

William J. Dally

,

Brucek Khailany

,

Peter R. Mattson

,

Ujval J. Kapasi

,

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

Polygon Rendering on a Stream Architecture.

[DOI]

,

William J. Dally

,

Ujval J. Kapasi

,

,

Peter R. Mattson

,

Proceedings of the 2000 ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, 2000

Communication Scheduling.

[DOI]

Peter R. Mattson

,

William J. Dally

,

,

Ujval J. Kapasi

,

Proceedings of the ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000

1998

A Bandwidth-efficient Architecture for Media Processing.

[DOI]

,

William J. Dally

,

Ujval J. Kapasi

,

Brucek Khailany

,

Abelardo López-Lagunas

,

Peter R. Mattson

,

Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Loading...