Richard W. Vuduc

According to our database1, Richard W. Vuduc authored at least 104 papers between 2000 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2019
A microbenchmark characterization of the Emu chick.
Parallel Computing, 2019

A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems.
J. Parallel Distrib. Comput., 2019

Optimizing sparse tensor times matrix on GPUs.
J. Parallel Distrib. Comput., 2019

Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization.
Journal of Biomedical Informatics, 2019

Load-Balanced Sparse MTTKRP on GPUs.
CoRR, 2019

Programming Strategies for Irregular Algorithms on the Emu Chick.
CoRR, 2019

A communication-avoiding 3D sparse triangular solver.
Proceedings of the ACM International Conference on Supercomputing, 2019

Efficient and effective sparse tensor reordering.
Proceedings of the ACM International Conference on Supercomputing, 2019

Faster parallel collision detection at high resolution for CNC milling applications.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
Autotuning in High-Performance Computing Applications.
Proceedings of the IEEE, 2018

Spatter: A Benchmark Suite for Evaluating Sparse Access Patterns.
CoRR, 2018

A Microbenchmark Characterization of the Emu Chick.
CoRR, 2018

A Simple Methodology for Computing Families of Algorithms.
CoRR, 2018

Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems.
CoRR, 2018

SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping.
CoRR, 2018

HiCOO: hierarchical storage of sparse tensors.
Proceedings of the International Conference for High Performance Computing, 2018

A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

An Energy-Efficient Single-Source Shortest Path Algorithm.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

An Initial Characterization of the Emu Chick.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

2017
Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines.
IEEE Trans. Parallel Distrib. Syst., 2017

Modeling the Power Variability of Core Speed Scaling on Homogeneous Multicore Systems.
Scientific Programming, 2017

SPARTan: Scalable PARAFAC2 for Large & Sparse Data.
CoRR, 2017

Polyadic Regression and its Application to Chemogenomics.
Proceedings of the 2017 SIAM International Conference on Data Mining, 2017

Efficient Communications in Training Large Scale Neural Networks.
Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

HPPAC Workshop Introduction.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Model-Driven Sparse CP Decomposition for Higher-Order Tensors.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

2016
Sparse Hierarchical Tucker Factorization and its Application to Healthcare.
CoRR, 2016

Wanted: Floating-Point Add Round-off Error instruction.
CoRR, 2016

Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures.
Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

Hybrid Dynamic Trees for Extreme-Resolution 3D Sparse Data Modeling.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Analyzing the Energy Efficiency of the Fast Multipole Method Using a DVFS-Aware Energy Model.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

A Self-Correcting Connected Components Algorithm.
Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, 2016

2015
UNICORN: a unified approach for localizing non-deadlock concurrency bugs.
Softw. Test., Verif. Reliab., 2015

An input-adaptive and in-place approach to dense tensor-times-matrix multiply.
Proceedings of the International Conference for High Performance Computing, 2015

A GPU-parallel construction of volumetric tree.
Proceedings of the 5th Workshop on Irregular Applications - Architectures and Algorithms, 2015

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014
A distributed kernel summation framework for general-dimension machine learning.
Statistical Analysis and Data Mining, 2014

Branch-Avoiding Graph Algorithms.
CoRR, 2014

Improving the energy efficiency of Big Cores.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

A Distributed CPU-GPU Sparse Direct Solver.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method.
Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

2013
Introduction for Special Issue on Autotuning.
IJHPCA, 2013

How much (execution) time and energy does my algorithm cost?
ACM Crossroads, 2013

Sustainable Software Development for Next-Gen Sequencing (NGS) Bioinformatics on Emerging Platforms.
CoRR, 2013

Self-stabilizing iterative solvers.
Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013

Methods for High-Throughput Computation of Elementary Functions.
Proceedings of the Parallel Processing and Applied Mathematics, 2013

Griffin: grouping suspicious memory-access patterns to improve understanding of concurrency bugs.
Proceedings of the International Symposium on Software Testing and Analysis, 2013

A Theoretical Framework for Algorithm-Architecture Co-design.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

A Roofline Model of Energy.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012
Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, 2012

When Prefetching Works, When It Doesn't, and Why.
TACO, 2012

A massively parallel adaptive fast multipole method on heterogeneous architectures.
Commun. ACM, 2012

Toward a Theory of Algorithm-Architecture Co-design.
Proceedings of the High Performance Computing for Computational Science, 2012

Brief announcement: towards a communication optimal fast multipole method and its implications at exascale.
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

A Distributed Kernel Summation Framework for General-Dimension Machine Learning.
Proceedings of the Twelfth SIAM International Conference on Data Mining, 2012

Optimizing the computation of n-point correlations on large-scale astronomical data.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Synthesizing Loops for Program Inversion.
Proceedings of the Reversible Computation, 4th International Workshop, 2012

A performance analysis framework for identifying potential benefits in GPGPU applications.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

A type theory for probability density functions.
Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2012

Courses in High-performance Computing for Scientists and Engineers.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Modeling and Analysis for Performance and Power.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Communication-Optimal Parallel N-body Solvers.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

A Unified Approach for Localizing Non-deadlock Concurrency Bugs.
Proceedings of the Fifth IEEE International Conference on Software Testing, 2012

On the communication complexity of 3D FFTs and its implications for Exascale.
Proceedings of the International Conference on Supercomputing, 2012

A New Method for Program Inversion.
Proceedings of the Compiler Construction - 21st International Conference, 2012

2011
Autotuning.
Proceedings of the Encyclopedia of Parallel Computing, 2011

The Sixth International Workshop on Automatic Performance Tuning (iWAPT2011).
Proceedings of the International Conference on Computational Science, 2011

What GPU Computing Means for High-End Systems.
IEEE Micro, 2011

The Backstroke framework for source level reverse computation applied to parallel discrete event simulation.
Proceedings of the Winter Simulation Conference 2011, 2011

Balance Principles for Algorithm-Architecture Co-Design.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011

2010
Toward interactive statistical modeling.
Proceedings of the International Conference on Computational Science, 2010

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures.
Proceedings of the Conference on High Performance Computing Networking, 2010

Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method.
Proceedings of the Conference on High Performance Computing Networking, 2010

Model-driven autotuning of sparse matrix-vector multiply on GPUs.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Applying the concurrent collections programming model to asynchronous parallel dense linear algebra.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Many-Thread Aware Prefetching Mechanisms for GPGPU Applications.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Unconventional wisdom in multicore computing.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Performance evaluation of concurrent collections on high-performance multicore computing systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Falcon: fault localization in concurrent programs.
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010

2009
Optimization of sparse matrix-vector multiplication on emerging multicore platforms.
Parallel Computing, 2009

Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Understanding the design trade-offs among current multicore systems for numerical computations.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems.
Proceedings of the 23rd international conference on Supercomputing, 2009

Direct N-body Kernels for Multicore Platforms.
Proceedings of the ICPP 2009, 2009

2007
When cache blocking of sparse matrix vector multiply works and why.
Appl. Algebra Eng. Commun. Comput., 2007

Techniques for specifying bug patterns.
Proceedings of the 5th Workshop on Parallel and Distributed Systems: Testing, 2007

POET: Parameterized Optimizations for Empirical Tuning.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Communicating Software Architecture using a Unified Single-View Visualization.
Proceedings of the 12th International Conference on Engineering of Complex Computer Systems (ICECCS 2007), 2007

2006
Improving distributed memory applications testing by message perturbation.
Proceedings of the 4th Workshop on Parallel and Distributed Systems: Testing, 2006

Annotating user-defined abstractions for optimization.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2005
Self-Adapting Linear Algebra Algorithms and Software.
Proceedings of the IEEE, 2005

An Extensible Open-Source Compiler Infrastructure for Testing.
Proceedings of the Hardware and Software Verification and Testing, 2005

Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure.
Proceedings of the High Performance Computing and Communications, 2005

2004
Statistical Models for Empirical Search-Based Performance Tuning.
IJHPCA, 2004

Sparsity: Optimization Framework for Sparse Matrix Kernels.
IJHPCA, 2004

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

2003
Memory Hierarchy Optimizations and Performance ounds for Sparse A.
Proceedings of the Computational Science - ICCS 2003, 2003

2002
Performance optimizations and bounds for sparse matrix-vector multiply.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

2001
Statistical Models for Automatic Performance Tuning.
Proceedings of the Computational Science - ICCS 2001, 2001

2000
SWAMI: a framework for collaborative filtering algorithm development and evaluation.
Proceedings of the SIGIR 2000: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000

Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW.
Proceedings of the Semantics, 2000


  Loading...