We stand with Ukraine

We stand with Ukraine

Richard W. Vuduc

Orcid: 0000-0003-2178-138X

Affiliations:

Georgia Institute of Technology, Atlanta GA, USA

According to our database¹, Richard W. Vuduc authored at least 141 papers between 2000 and 2025.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2025

Back to Bits: Extending Shannon's communication performance framework to computing.

[DOI]

,

Richard W. Vuduc

CoRR, August, 2025

Brief Announcement: Optimality Conditions for Parallel Communication-Avoiding Matrix Multiplication with Overlapped Communication.

[DOI]

,

,

Richard W. Vuduc

Proceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures, 2025

An Asynchronous Distributed-Memory Parallel Algorithm for $k$-Mer Counting.

[DOI]

,

Akihiro Hayashi

,

Richard W. Vuduc

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

Cases When Communicating More is Faster.

[DOI]

,

Richard W. Vuduc

Proceedings of the 3rd Highlights of Parallel Computing Workshop, 2025

Fast Active-Set Thresholding Method for Nonnegative Least Squares.

[DOI]

,

Ramakrishnan Kannan

,

Konstantin Pieper

,

,

,

,

Richard W. Vuduc

,

Proceedings of the IEEE International Conference on Big Data, 2025

2024

FAIR Sharing of Data in Autotuning Research (Vision Paper).

[DOI]

,

Jacob O. Tørring

,

Ben van Werkhoven

,

,

Richard W. Vuduc

Proceedings of the Companion of the 15th ACM/SPEC International Conference on Performance Engineering, 2024

Asynchronous Distributed-Memory Parallel Algorithms for Influence Maximization.

[DOI]

Shubhendra Pal Singhal

,

,

,

,

Akihiro Hayashi

,

Richard W. Vuduc

Proceedings of the International Conference for High Performance Computing, 2024

A Workflow for the Synthesis of Irregular Memory Access Microbenchmarks.

[DOI]

,

Jered Dominguez-Trujillo

,

Galen M. Shipman

,

,

Christopher Scott

,

Agustin Vaca Valverde

,

Richard W. Vuduc

,

Proceedings of the International Symposium on Memory Systems, 2024

On Rank Selection for Nonnegative Matrix Factorization.

[DOI]

,

,

,

Ramakrishnan Kannan

,

,

Richard W. Vuduc

,

Proceedings of the IEEE International Conference on Big Data, 2024

Clustering and Topic Discovery of Multiway Data via Joint-NCMTF.

[DOI]

,

Ricardo Velasquez

,

Richard W. Vuduc

,

Proceedings of the IEEE International Conference on Big Data, 2024

2023

AminerMag X Dataset.

[DOI]

,

,

,

Ramakrishnan Kannan

,

,

,

Dataset, June, 2023

AminerMag S Dataset.

[DOI]

,

,

,

Ramakrishnan Kannan

,

,

,

Dataset, June, 2023

Calculon: a methodology and tool for high-level co-design of systems and large language models.

[DOI]

,

,

,

Richard W. Vuduc

Proceedings of the International Conference for High Performance Computing, 2023

Multifidelity Memory System Simulation in SST.

[DOI]

,

,

Richard W. Vuduc

Proceedings of the International Symposium on Memory Systems, 2023

Distributed-Memory Parallel JointNMF.

[DOI]

,

,

,

Ramakrishnan Kannan

,

,

Richard W. Vuduc

,

Proceedings of the 37th International Conference on Supercomputing, 2023

2022

Critique of "MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization" by SCC Team From Georgia Tech.

[DOI]

,

,

,

,

,

Sudhanshu Agarwal

,

Richard W. Vuduc

,

IEEE Trans. Parallel Distributed Syst., 2022

Jack, The Autotuner.

[DOI]

Richard W. Vuduc

Comput. Sci. Eng., 2022

Exaflops Biomedical Knowledge Graph Analytics.

[DOI]

Ramakrishnan Kannan

,

,

,

,

,

,

,

,

,

,

Robert M. Patton

,

Sergio E. Baranzini

,

Richard W. Vuduc

,

Thomas E. Potok

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Nimble GNN Embedding with Tensor-Train Decomposition.

[DOI]

,

,

,

Christos Faloutsos

,

,

Richard W. Vuduc

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

"Smarter" NICs for faster molecular dynamics: a case study.

[DOI]

,

,

K. Scott Hemmert

,

,

,

,

Thomas M. Conte

,

,

Richard W. Vuduc

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

ParaGraph: An application-simulator interface and toolkit for hardware-software co-design.

[DOI]

,

,

,

Richard W. Vuduc

Proceedings of the 51st International Conference on Parallel Processing, 2022

2021

ORCA: Outlier detection and Robust Clustering for Attributed graphs.

[DOI]

,

Ramakrishnan Kannan

,

Richard W. Vuduc

,

J. Glob. Optim., 2021

Communication-avoiding kernel ridge regression on parallel and distributed systems.

[DOI]

,

,

,

Richard W. Vuduc

,

CCF Trans. High Perform. Comput., 2021

Is it Nemo or Dory? Fast and accurate object detection for IoT and edge devices.

[DOI]

Sudhanshu Agarwal

,

Richard W. Vuduc

Proceedings of the IoT '21: 11th International Conference on the Internet of Things, St. Gallen, Switzerland, November 8, 2021

An interface for multidimensional arrays in Arkouda.

[DOI]

,

Richard W. Vuduc

Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Scalable All-pairs Shortest Paths for Huge Graphs on Multi-GPU Clusters.

[DOI]

,

,

Ramakrishnan Kannan

,

,

Richard W. Vuduc

,

Thomas E. Potok

Proceedings of the HPDC '21: The 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021

Online model swapping for architectural simulation.

[DOI]

,

,

Richard W. Vuduc

,

Proceedings of the CF '21: Computing Frontiers Conference, 2021

CUP: Cluster Pruning for Compressing Deep Neural Networks.

[DOI]

,

,

Richard W. Vuduc

,

Duen Horng Chau

,

Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

2020

Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs.

[DOI]

,

,

,

,

,

Richard W. Vuduc

IEEE Trans. Parallel Distributed Syst., 2020

Programming Strategies for Irregular Algorithms on the Emu Chick.

[DOI]

,

,

Abdurrahman Yasar

,

,

Jeffrey S. Young

,

Thomas M. Conte

,

Ümit V. Çatalyürek

,

Richard W. Vuduc

,

,

ACM Trans. Parallel Comput., 2020

Scalable knowledge graph analytics at 136 petaflop/s.

[DOI]

Ramakrishnan Kannan

,

,

,

Drahomira Herrmannova

,

,

Robert M. Patton

,

Richard W. Vuduc

,

Thomas E. Potok

Proceedings of the International Conference for High Performance Computing, 2020

Distributed-memory parallel symmetric nonnegative matrix factorization.

[DOI]

,

,

,

Ramakrishnan Kannan

,

Richard W. Vuduc

,

Proceedings of the International Conference for High Performance Computing, 2020

A supernodal all-pairs shortest path algorithm.

[DOI]

,

Ramakrishnan Kannan

,

,

Richard W. Vuduc

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Intrepydd: performance, productivity, and portability for data science application kernels.

[DOI]

,

,

,

Sriseshan Srikanth

,

Thomas M. Conte

,

Richard W. Vuduc

,

Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, 2020

Evaluating Gather and Scatter Performance on CPUs and GPUs.

[DOI]

,

,

Richard W. Vuduc

,

,

,

Proceedings of the MEMSYS 2020: The International Symposium on Memory Systems, 2020

Max orientation coverage: efficient path planning to avoid collisions in the CNC milling of 3D objects.

[DOI]

,

Thomas M. Tucker

,

Thomas R. Kurfess

,

Richard W. Vuduc

,

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

2019

A microbenchmark characterization of the Emu chick.

[DOI]

Jeffrey S. Young

,

,

,

,

,

,

Richard W. Vuduc

,

Parallel Comput., 2019

A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems.

[DOI]

,

,

Richard W. Vuduc

J. Parallel Distributed Comput., 2019

Optimizing sparse tensor times matrix on GPUs.

[DOI]

,

,

,

,

,

Richard W. Vuduc

J. Parallel Distributed Comput., 2019

Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization.

[DOI]

,

Evangelos E. Papalexakis

,

Richard W. Vuduc

,

Elizabeth Searles

,

J. Biomed. Informatics, 2019

CUP: Cluster Pruning for Compressing Deep Neural Networks.

[DOI]

,

,

Richard W. Vuduc

,

CoRR, 2019

Self-stabilizing Connected Components.

[DOI]

,

Christian Engelmann

,

,

,

Richard W. Vuduc

Proceedings of the 9th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2019

Adaptive Deep Path: Efficient Coverage of a Known Environment under Various Configurations.

[DOI]

,

Thomas M. Tucker

,

Thomas R. Kurfess

,

Richard W. Vuduc

Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

Load-Balanced Sparse MTTKRP on GPUs.

[DOI]

,

,

Aravind Sukumaran-Rajam

,

Richard W. Vuduc

,

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

A communication-avoiding 3D sparse triangular solver.

[DOI]

,

Ramakrishnan Kannan

,

Xiaoye Sherry Li

,

Richard W. Vuduc

Proceedings of the ACM International Conference on Supercomputing, 2019

Efficient and effective sparse tensor reordering.

[DOI]

,

,

Ümit V. Çatalyürek

,

,

Kevin J. Barker

,

Richard W. Vuduc

Proceedings of the ACM International Conference on Supercomputing, 2019

Faster parallel collision detection at high resolution for CNC milling applications.

[DOI]

,

Dmytro Konobrytskyi

,

Thomas M. Tucker

,

Thomas R. Kurfess

,

Richard W. Vuduc

Proceedings of the 48th International Conference on Parallel Processing, 2019

2018

Autotuning in High-Performance Computing Applications.

[DOI]

Prasanna Balaprakash

,

Jack J. Dongarra

,

,

,

Jeffrey K. Hollingsworth

,

,

Richard W. Vuduc

Proc. IEEE, 2018

Spatter: A Benchmark Suite for Evaluating Sparse Access Patterns.

[DOI]

,

,

,

Jeffrey S. Young

CoRR, 2018

A Simple Methodology for Computing Families of Algorithms.

[DOI]

Devangi N. Parikh

,

Margaret E. Myers

,

Richard W. Vuduc

,

Robert A. van de Geijn

CoRR, 2018

HiCOO: hierarchical storage of sparse tensors.

[DOI]

,

,

Richard W. Vuduc

Proceedings of the International Conference for High Performance Computing, 2018

SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping.

[DOI]

,

Evangelos E. Papalexakis

,

,

Richard W. Vuduc

,

,

Christopher deFilippi

,

Walter F. Stewart

,

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices.

[DOI]

,

Xiaoye Sherry Li

,

Richard W. Vuduc

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

An Energy-Efficient Single-Source Shortest Path Algorithm.

[DOI]

,

Jeffrey S. Young

,

Richard W. Vuduc

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

An Initial Characterization of the Emu Chick.

[DOI]

,

,

,

,

,

,

Richard W. Vuduc

,

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems.

[DOI]

,

,

,

Richard W. Vuduc

Proceedings of the 32nd International Conference on Supercomputing, 2018

2017

Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines.

[DOI]

,

,

Kent Czechowski

,

,

IEEE Trans. Parallel Distributed Syst., 2017

Modeling the Power Variability of Core Speed Scaling on Homogeneous Multicore Systems.

[DOI]

,

,

,

Richard W. Vuduc

,

,

Sci. Program., 2017

Polyadic Regression and its Application to Chemogenomics.

[DOI]

,

,

,

Peter B. Walker

,

Richard W. Vuduc

,

Jyotishman Pathak

,

Proceedings of the 2017 SIAM International Conference on Data Mining, 2017

Efficient Communications in Training Large Scale Neural Networks.

[DOI]

,

,

,

,

Richard W. Vuduc

,

,

,

Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

SPARTan: Scalable PARAFAC2 for Large & Sparse Data.

[DOI]

,

Evangelos E. Papalexakis

,

,

Richard W. Vuduc

,

Elizabeth Searles

,

Michael Thompson

,

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13, 2017

HPPAC Workshop Introduction.

[DOI]

Shuaiwen Leon Song

,

Richard W. Vuduc

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Model-Driven Sparse CP Decomposition for Higher-Order Tensors.

[DOI]

,

,

,

,

Richard W. Vuduc

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

2016

Wanted: Floating-Point Add Round-off Error instruction.

[DOI]

,

Richard W. Vuduc

,

CoRR, 2016

Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures.

[DOI]

,

,

,

Richard W. Vuduc

Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

Hybrid Dynamic Trees for Extreme-Resolution 3D Sparse Data Modeling.

[DOI]

Mohammad M. Hossain

,

Thomas M. Tucker

,

Thomas R. Kurfess

,

Richard W. Vuduc

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Analyzing the Energy Efficiency of the Fast Multipole Method Using a DVFS-Aware Energy Model.

[DOI]

,

Richard W. Vuduc

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

A Self-Correcting Connected Components Algorithm.

[DOI]

,

,

,

Richard W. Vuduc

Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, 2016

2015

UNICORN: a unified approach for localizing non-deadlock concurrency bugs.

[DOI]

,

Richard W. Vuduc

,

Mary Jean Harrold

Softw. Test. Verification Reliab., 2015

Branch-Avoiding Graph Algorithms.

[DOI]

,

,

Richard W. Vuduc

Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, 2015

An input-adaptive and in-place approach to dense tensor-times-matrix multiply.

[DOI]

,

Casey Battaglino

,

,

,

Richard W. Vuduc

Proceedings of the International Conference for High Performance Computing, 2015

A GPU-parallel construction of volumetric tree.

[DOI]

Mohammad M. Hossain

,

Thomas M. Tucker

,

Thomas R. Kurfess

,

Richard W. Vuduc

Proceedings of the 5th Workshop on Irregular Applications - Architectures and Algorithms, 2015

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems.

[DOI]

,

,

Kenneth Czechowski

,

,

Richard W. Vuduc

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems.

[DOI]

,

,

Richard W. Vuduc

,

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Sparse Hierarchical Tucker Factorization and Its Application to Healthcare.

[DOI]

,

,

Richard W. Vuduc

,

Proceedings of the 2015 IEEE International Conference on Data Mining, 2015

2014

A distributed kernel summation framework for general-dimension machine learning.

[DOI]

,

,

Richard W. Vuduc

,

Alexander G. Gray

Stat. Anal. Data Min., 2014

Improving the energy efficiency of Big Cores.

[DOI]

Kenneth Czechowski

,

,

,

,

,

Richard W. Vuduc

,

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks.

[DOI]

,

,

,

Richard W. Vuduc

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

A Distributed CPU-GPU Sparse Direct Solver.

[DOI]

,

Richard W. Vuduc

,

Xiaoye Sherry Li

Proceedings of the Euro-Par 2014 Parallel Processing, 2014

A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method.

[DOI]

,

Aparna Chandramowlishwaran

,

,

Richard W. Vuduc

Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

2013

Introduction for Special Issue on Autotuning.

[DOI]

,

Richard W. Vuduc

Int. J. High Perform. Comput. Appl., 2013

How much (execution) time and energy does my algorithm cost?

[DOI]

,

Richard W. Vuduc

XRDS, 2013

Sustainable Software Development for Next-Gen Sequencing (NGS) Bioinformatics on Emerging Platforms.

[DOI]

,

,

Viktor K. Prasanna

,

Manish Parashar

,

,

,

Richard W. Vuduc

CoRR, 2013

Self-stabilizing iterative solvers.

[DOI]

,

Richard W. Vuduc

Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013

Methods for High-Throughput Computation of Elementary Functions.

[DOI]

,

Richard W. Vuduc

Proceedings of the Parallel Processing and Applied Mathematics, 2013

Griffin: grouping suspicious memory-access patterns to improve understanding of concurrency bugs.

[DOI]

,

Mary Jean Harrold

,

Richard W. Vuduc

Proceedings of the International Symposium on Software Testing and Analysis, 2013

A Theoretical Framework for Algorithm-Architecture Co-design.

[DOI]

Kenneth Czechowski

,

Richard W. Vuduc

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

A Roofline Model of Energy.

[DOI]

,

,

Robert J. Fowler

,

Richard W. Vuduc

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012

Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)

[DOI]

,

Richard W. Vuduc

,

Sara S. Baghsorkhi

,

,

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01737-7, 2012

When Prefetching Works, When It Doesn't, and Why.

[DOI]

,

,

Richard W. Vuduc

ACM Trans. Archit. Code Optim., 2012

Toward a Theory of Algorithm-Architecture Co-design.

[DOI]

Richard W. Vuduc

,

Kenneth Czechowski

Proceedings of the High Performance Computing for Computational Science, 2012

Brief announcement: towards a communication optimal fast multipole method and its implications at exascale.

[DOI]

Aparna Chandramowlishwaran

,

,

,

Richard W. Vuduc

Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

A Distributed Kernel Summation Framework for General-Dimension Machine Learning.

[DOI]

,

Richard W. Vuduc

,

Alexander G. Gray

Proceedings of the Twelfth SIAM International Conference on Data Mining, 2012

Optimizing the computation of n-point correlations on large-scale astronomical data.

[DOI]

William B. March

,

Kenneth Czechowski

,

,

,

,

Andrew J. Connolly

,

Richard W. Vuduc

,

,

Alexander G. Gray

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Synthesizing Loops for Program Inversion.

[DOI]

,

Daniel J. Quinlan

,

David R. Jefferson

,

Richard Fujimoto

,

Richard W. Vuduc

Proceedings of the Reversible Computation, 4th International Workshop, 2012

A performance analysis framework for identifying potential benefits in GPGPU applications.

[DOI]

,

Aniruddha Dasgupta

,

,

Richard W. Vuduc

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

A type theory for probability density functions.

[DOI]

,

,

Richard W. Vuduc

,

Alexander G. Gray

Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2012

Courses in High-performance Computing for Scientists and Engineers.

[DOI]

Richard W. Vuduc

,

Kenneth Czechowski

,

Aparna Chandramowlishwaran

,

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Modeling and Analysis for Performance and Power.

[DOI]

,

Richard W. Vuduc

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Communication-Optimal Parallel N-body Solvers.

[DOI]

Aparna Chandramowlishwaran

,

Richard W. Vuduc

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

A Unified Approach for Localizing Non-deadlock Concurrency Bugs.

[DOI]

,

Richard W. Vuduc

,

Mary Jean Harrold

Proceedings of the Fifth IEEE International Conference on Software Testing, 2012

On the communication complexity of 3D FFTs and its implications for Exascale.

[DOI]

Kenneth Czechowski

,

Casey Battaglino

,

Chris McClanahan

,

,

,

Richard W. Vuduc

Proceedings of the International Conference on Supercomputing, 2012

A New Method for Program Inversion.

[DOI]

,

,

Daniel J. Quinlan

,

David R. Jefferson

,

Richard Fujimoto

,

Richard W. Vuduc

Proceedings of the Compiler Construction - 21st International Conference, 2012

2011

Autotuning.

[DOI]

Richard W. Vuduc

Proceedings of the Encyclopedia of Parallel Computing, 2011

The Sixth International Workshop on Automatic Performance Tuning (iWAPT2011).

[DOI]

Takahiro Katagiri

,

Richard W. Vuduc

Proceedings of the International Conference on Computational Science, 2011

What GPU Computing Means for High-End Systems.

[DOI]

Richard W. Vuduc

,

Kent Czechowski

IEEE Micro, 2011

The Backstroke framework for source level reverse computation applied to parallel discrete event simulation.

[DOI]

,

,

Richard W. Vuduc

,

Richard Fujimoto

,

Daniel J. Quinlan

,

David R. Jefferson

Proceedings of the Winter Simulation Conference 2011, 2011

Balance Principles for Algorithm-Architecture Co-Design.

[DOI]

Kent Czechowski

,

Casey Battaglino

,

Chris McClanahan

,

Aparna Chandramowlishwaran

,

Richard W. Vuduc

Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011

2010

Toward interactive statistical modeling.

[DOI]

,

,

Alexander G. Gray

,

Richard W. Vuduc

Proceedings of the International Conference on Computational Science, 2010

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures.

[DOI]

,

,

Shravan K. Veerapaneni

,

Aparna Chandramowlishwaran

,

Dhairya Malhotra

,

,

Rahul S. Sampath

,

Aashay Shringarpure

,

Jeffrey S. Vetter

,

Richard W. Vuduc

,

,

Proceedings of the Conference on High Performance Computing Networking, 2010

Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method.

[DOI]

Aparna Chandramowlishwaran

,

,

Richard W. Vuduc

Proceedings of the Conference on High Performance Computing Networking, 2010

Model-driven autotuning of sparse matrix-vector multiply on GPUs.

[DOI]

,

,

Richard W. Vuduc

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Applying the concurrent collections programming model to asynchronous parallel dense linear algebra.

[DOI]

Aparna Chandramowlishwaran

,

,

Richard W. Vuduc

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Many-Thread Aware Prefetching Mechanisms for GPGPU Applications.

[DOI]

,

Nagesh B. Lakshminarayana

,

,

Richard W. Vuduc

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Unconventional wisdom in multicore computing.

[DOI]

Richard W. Vuduc

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures.

[DOI]

Aparna Chandramowlishwaran

,

Samuel Williams

,

,

,

,

Richard W. Vuduc

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Performance evaluation of concurrent collections on high-performance multicore computing systems.

[DOI]

Aparna Chandramowlishwaran

,

,

Richard W. Vuduc

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Falcon: fault localization in concurrent programs.

[DOI]

,

Richard W. Vuduc

,

Mary Jean Harrold

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010

2009

A massively parallel adaptive fast-multipole method on heterogeneous architectures.

[DOI]

,

Aparna Chandramowlishwaran

,

Harper Langston

,

Tuan-Anh Nguyen

,

Rahul S. Sampath

,

Aashay Shringarpure

,

Richard W. Vuduc

,

,

,

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization.

[DOI]

,

Daniel J. Quinlan

,

Richard W. Vuduc

,

Proceedings of the Languages and Compilers for Parallel Computing, 2009

Understanding the design trade-offs among current multicore systems for numerical computations.

[DOI]

,

,

Richard W. Vuduc

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems.

[DOI]

Sundaresan Venkatasubramanian

,

Richard W. Vuduc

Proceedings of the 23rd international conference on Supercomputing, 2009

Direct N-body Kernels for Multicore Platforms.

[DOI]

,

Aashay Shringarpure

,

Richard W. Vuduc

Proceedings of the ICPP 2009, 2009

2007

When cache blocking of sparse matrix vector multiply works and why.

[DOI]

Rajesh Nishtala

,

Richard W. Vuduc

,

,

Katherine A. Yelick

Appl. Algebra Eng. Commun. Comput., 2007

Optimization of sparse matrix-vector multiplication on emerging multicore platforms.

[DOI]

Samuel Williams

,

,

Richard W. Vuduc

,

,

Katherine A. Yelick

,

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Techniques for specifying bug patterns.

[DOI]

Daniel J. Quinlan

,

Richard W. Vuduc

,

Ghassan Misherghi

Proceedings of the 5th Workshop on Parallel and Distributed Systems: Testing, 2007

POET: Parameterized Optimizations for Empirical Tuning.

[DOI]

,

,

,

Richard W. Vuduc

,

Daniel J. Quinlan

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Communicating Software Architecture using a Unified Single-View Visualization.

[DOI]

,

,

Daniel J. Quinlan

,

Andreas Sæbjørnsen

,

Richard W. Vuduc

Proceedings of the 12th International Conference on Engineering of Complex Computer Systems (ICECCS 2007), 2007

2006

Improving distributed memory applications testing by message perturbation.

[DOI]

Richard W. Vuduc

,

,

Daniel J. Quinlan

,

Bronis R. de Supinski

,

Andreas Sæbjørnsen

Proceedings of the 4th Workshop on Parallel and Distributed Systems: Testing, 2006

Annotating user-defined abstractions for optimization.

[DOI]

Daniel J. Quinlan

,

Markus Schordan

,

Richard W. Vuduc

,

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2005

Self-Adapting Linear Algebra Algorithms and Software.

[DOI]

Richard Carl Demmel

,

Jack J. Dongarra

,

Victor Eijkhout

,

,

Antoine Petitet

,

Richard W. Vuduc

,

R. Clint Whaley

,

Katherine A. Yelick

Proc. IEEE, 2005

An Extensible Open-Source Compiler Infrastructure for Testing.

[DOI]

Daniel J. Quinlan

,

,

Richard W. Vuduc

Proceedings of the Hardware and Software Verification and Testing, 2005

Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure.

[DOI]

Richard W. Vuduc

,

Proceedings of the High Performance Computing and Communications, 2005

2004

Statistical Models for Empirical Search-Based Performance Tuning.

[DOI]

Richard W. Vuduc

,

,

Int. J. High Perform. Comput. Appl., 2004

Sparsity: Optimization Framework for Sparse Matrix Kernels.

[DOI]

,

Katherine A. Yelick

,

Richard W. Vuduc

Int. J. High Perform. Comput. Appl., 2004

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply.

[DOI]

Benjamin C. Lee

,

Richard W. Vuduc

,

,

Katherine A. Yelick

Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

2003

Memory Hierarchy Optimizations and Performance ounds for Sparse A.

[DOI]

,

Attila Gyulassy

,

,

Katherine A. Yelick

Proceedings of the Computational Science - ICCS 2003, 2003

2002

Performance optimizations and bounds for sparse matrix-vector multiply.

[DOI]

,

,

Katherine A. Yelick

,

,

Rajesh Nishtala

,

Benjamin C. Lee

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

2001

Statistical Models for Automatic Performance Tuning.

[DOI]

,

,

Proceedings of the Computational Science - ICCS 2001, 2001

2000

SWAMI: a framework for collaborative filtering algorithm development and evaluation.

[DOI]

,

,

,

,

,

Proceedings of the SIGIR 2000: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000

Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW.

[DOI]

,

Proceedings of the Semantics, 2000

Loading...