David E. Keyes

According to our database1, David E. Keyes authored at least 132 papers between 1985 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2019
Massively Parallel Polar Decomposition on Distributed-memory Systems.
TOPC, 2019

A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems.
ACM Trans. Math. Softw., 2019

Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs.
ACM Trans. Math. Softw., 2019

Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression.
ACM Trans. Math. Softw., 2019

Randomized GPU Algorithms for the Construction of Hierarchical Matrices from Matrix-Vector Operations.
SIAM J. Scientific Computing, 2019

Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering.
SIAM J. Scientific Computing, 2019

Hierarchical-block conditioning approximations for high-dimensional multivariate normal probabilities.
Statistics and Computing, 2019

Fast parallel multidimensional FFT using advanced MPI.
J. Parallel Distrib. Comput., 2019

mpi4py-fft: Parallel Fast Fourier Transforms with MPI for Python.
J. Open Source Software, 2019

Combining finite element and finite difference methods for isotropic elastic wave simulations in an energy-conserving manner.
J. Comput. Physics, 2019

SBP-SAT finite difference discretization of acoustic wave equations on staggered block-wise uniform grids.
J. Computational Applied Mathematics, 2019

Likelihood approximation with hierarchical matrices for large spatial datasets.
Computational Statistics & Data Analysis, 2019

ExaGeoStatR: A Package for Large-Scale Geostatistics in R.
CoRR, 2019

Solution of the 3D density-driven groundwater flow problem with uncertain porosity and permeability.
CoRR, 2019

Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression.
CoRR, 2019

Tucker Tensor Analysis of Matérn Functions in Spatial Statistics.
Comput. Meth. in Appl. Math., 2019

2018
Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures.
IEEE Trans. Parallel Distrib. Syst., 2018

Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures.
IEEE Trans. Parallel Distrib. Syst., 2018

ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems.
IEEE Trans. Parallel Distrib. Syst., 2018

Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations.
TOPC, 2018

A Note on Adaptive Nonlinear Preconditioning Techniques.
SIAM J. Scientific Computing, 2018

Accelerated Cyclic Reduction: A distributed-memory fast solver for structured linear systems.
Parallel Computing, 2018

Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression.
Parallel Computing, 2018

Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients.
J. Computational Applied Mathematics, 2018

Big data and extreme-scale computing.
IJHPCA, 2018

Fast parallel multidimensional FFT using advanced MPI.
CoRR, 2018

Tile Low-Rank Approximation of Large-Scale Maximum Likelihood Estimation on Manycore Architectures.
CoRR, 2018

Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering.
CoRR, 2018

Fast multipole preconditioners for sparse matrices arising from elliptic equations.
Computat. and Visualiz. in Science, 2018

Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2018

Real-Time Massively Distributed Multi-object Adaptive Optics Simulations for the European Extremely Large Telescope.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Performance Assessment of Hybrid Parallelism for Large-Scale Reservoir Simulation on Multi- and Many-core Architectures.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Tile Low-Rank GEMM Using Batched Operations on GPUs.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Exploiting Data Sparsity for Large-Scale Matrix Computations.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017
A scalable community detection algorithm for large graphs using stochastic block models.
Intell. Data Anal., 2017

ExaGeoStat: A High Performance Unified Framework for Geostatistics on Manycore Systems.
CoRR, 2017

Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions.
CoRR, 2017

Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression.
CoRR, 2017

A framework for dense triangular matrix kernels on various manycore architectures.
Concurrency and Computation: Practice and Experience, 2017

Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

Asynchronous Task-Based Parallelization of Algebraic Multigrid.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2017

Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016
A High Performance QDWH-SVD Solver Using Hardware Accelerators.
ACM Trans. Math. Softw., 2016

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators.
ACM Trans. Math. Softw., 2016

Accelerated Dimension-Independent Adaptive Metropolis.
SIAM J. Scientific Computing, 2016

Convergence Analysis for the Multiplicative Schwarz Preconditioned Inexact Newton Algorithm.
SIAM J. Numerical Analysis, 2016

Unstructured computational aerodynamics on many integrated core architecture.
Parallel Computing, 2016

A performance model for the communication in fast multipole methods on high-performance computing platforms.
IJHPCA, 2016

Fast Multipole Method as a Matrix-Free Hierarchical Low-Rank Approximation.
CoRR, 2016

Research and Education in Computational Science and Engineering.
CoRR, 2016

A Matrix-free Preconditioner for the Helmholtz Equation based on the Fast Multipole Method.
CoRR, 2016

A Direct Elliptic Solver Based on Hierarchically Low-rank Schur Complements.
CoRR, 2016

Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs.
Concurrency and Computation: Practice and Experience, 2016

Efficiency of High Order Spectral Element Methods on Petascale Architectures.
Proceedings of the High Performance Computing - 31st International Conference, 2016

On the Robustness and Prospects of Adaptive BDDC Methods for Finite Element Discretizations of Elliptic PDEs with High-Contrast Coefficients.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2016

Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Efficient Sphere Detector Algorithm for Massive MIMO using GPU Hardware Accelerator.
Proceedings of the International Conference on Computational Science 2016, 2016

High Performance Polar Decomposition on Distributed Memory Systems.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

Redesigning Triangular Dense Matrix Computations on GPUs.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015
Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates.
SIAM J. Scientific Computing, 2015

Field-Split Preconditioned Inexact Newton Algorithms.
SIAM J. Scientific Computing, 2015

A parallel domain decomposition-based implicit method for the Cahn-Hilliard-Cook phase-field equation in 3D.
J. Comput. Physics, 2015

Smooth and robust solutions for Dirichlet boundary control of fluid-solid conjugate heat transfer problems.
J. Comput. Physics, 2015

Multi-dimensional intra-tile parallelization for memory-starved stencil computations.
CoRR, 2015

Optimization of an electromagnetics code with multicore wavefront diamond blocking and multi-dimensional intra-tile parallelization.
CoRR, 2015

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms.
CoRR, 2015

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

A Scalable Community Detection Algorithm for Large Graphs Using Stochastic Block Models.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Composing Algorithmic Skeletons to Express High-Performance Scientific Applications.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

2014
Communication Complexity of the Fast Multipole Method and its Algebraic Variants.
CoRR, 2014

Multicore-optimized wavefront diamond blocking for optimizing stencil updates.
CoRR, 2014

Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking.
CoRR, 2014

A Performance Model for the Communication in Fast Multipole Methods on HPC Platforms.
CoRR, 2014

Asynchronous Execution of the Fast Multipole Method Using Charm++.
CoRR, 2014

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators.
CoRR, 2014

Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System.
Proceedings of the International Conference for High Performance Computing, 2014

High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

2013
The Miracle, Mandate and Mirage of High Performance Computing.
it - Information Technology, 2013

Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor.
IJHPCA, 2013

Multiphysics simulations: Challenges and opportunities.
IJHPCA, 2013

Fast Multipole Preconditioners for Sparse Matrices Arising from Elliptic Equations.
CoRR, 2013

Topic 14+16: High-Performance and Scientific Applications and Extreme-Scale Computing - (Introduction).
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012
A Quasi-algebraic Multigrid Approach to Fracture Problems Based on Extended Finite Elements.
SIAM J. Scientific Computing, 2012

Numerical simulation of four-field extended magnetohydrodynamics in dynamically adaptive curvilinear coordinates via Newton-Krylov-Schwarz.
J. Comput. Physics, 2012

Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene/P PowerPC 450 Processor
CoRR, 2012

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators.
Proceedings of the High Performance Computing for Computational Science, 2012

Multiplicative Algorithms for Constrained Non-negative Matrix Factorization.
Proceedings of the 12th IEEE International Conference on Data Mining, 2012

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

2011
Special Section: 2010 Copper Mountain Conference.
SIAM J. Scientific Computing, 2011

The International Exascale Software Project roadmap.
IJHPCA, 2011

Moving grids for magnetic reconnection via Newton-Krylov methods.
Computer Physics Communications, 2011

Hybrid Programming Model for Implicit PDE Simulations on Multicore Architectures.
Proceedings of the OpenMP in the Petascale Era - 7th International Workshop on OpenMP, 2011

2010
Application of Alternating Decision Trees in Selecting Sparse Linear Solvers.
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009
Linear augmented Slater-type orbital method for free standing clusters.
Journal of Computational Chemistry, 2009

Partial Differential Equation-Based Applications and Solvers At Extreme Scale.
IJHPCA, 2009

Modeling wildland fire propagation with level set methods.
Computers & Mathematics with Applications, 2009

2008
Special Issue on Computational Science and Engineering.
SIAM J. Scientific Computing, 2008

Petaflop/s, seriously.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

2007
Additive Schwarz-based fully coupled implicit methods for resistive Hall magnetohydrodynamic problems.
J. Comput. Physics, 2007

Reconstructing parameters of the FitzHugh-Nagumo system from boundary potential measurements.
Journal of Computational Neuroscience, 2007

2006
Multi-core issues - Multi-Core for HPC: breakthrough or breakdown?
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

M06 - Issues for the future of supercomputing: impact of Moore's law and architecture on application performance.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Grid-based Image Registration.
Proceedings of the Grid-Based Problem Solving Environments, 2006

Parallel Algorithms for PDE-Constrained Optimization.
Proceedings of the Parallel Processing for Scientific Computing, 2006

2004
Topic 11: Numerical Algorithms.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

2003
Pseudotransient Continuation and Differential-Algebraic Equations.
SIAM J. Scientific Computing, 2003

2002
Nonlinearly Preconditioned Inexact Newton Algorithms.
SIAM J. Scientific Computing, 2002

2001
High-performance parallel implicit CFD.
Parallel Computing, 2001

2000
Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD.
IJHPCA, 2000

Performance Modeling and Tuning of an Unstructured Mesh CFD Application.
Proceedings of the Proceedings Supercomputing 2000, 2000

Analyzing the Parallel Scalability of an Implicit Unstructured Mesh CFD Code.
Proceedings of the High Performance Computing, 2000

Four Horizons for Enhancing the Performance of Parallel Simulations Based on Partial Differential Equations.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999
Adapting to Hostile Architectural Environments.
Scalable Computing: Practice and Experience, 1999

Three Parallel Programming Paradigms: Comparisons on an Archetypal PDE Computation.
Scalable Computing: Practice and Experience, 1999

Achieving High Sustained Performance in an Unstructured Mesh CFD Application.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Parallelization of an Object-Oriented Unstructured Aeroacoustics Solver.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

1998
Parallel Newton-Krylov-Schwarz Algorithms for the Transonic Full Potential Equation.
SIAM J. Scientific Computing, 1998

1996
A Hyperbolic Model for Communications in Layered Parallel Processing Environments.
J. Parallel Distrib. Comput., 1996

Evaluating the Hyperbolic Model on a Variety of Architectures.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

1995
Modeling Communication in Cluster Computing.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

1994
Towards Polyalgorithmic Linear System Solvers for Nonlinear Elliptic Problems.
SIAM J. Scientific Computing, 1994

A comparison of some domain decomposition and ILU preconditioned iterative methods for nonsymmetric elliptic problems.
Numerical Lin. Alg. with Applic., 1994

1992
Domain Decomposition with Local Mesh Refinement.
SIAM J. Scientific Computing, 1992

Parallel Performance of Domain-Decomposed Preconditioned Krylov Methods for PDEs with Locally Uniform Refinement.
SIAM J. Scientific Computing, 1992

1989
Domain decomposition on parallel computers.
IMPACT Comput. Sci. Eng., 1989

Balanced Divide-and-Conquer Algorithms for the Fine-Grained Parallel Direct Solution of Dense and Banded Triangular Linear Systems and their Connection Machine Implementation.
Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing, 1989

Parallel Domain Decomposition with Local Mesh Refinement.
Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing, 1989

1987
Analysis of a Parallized Elliptic Solver for Reacting Flows-Abstract.
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, 1987

1985
A comparison of domain decomposition techniques for elliptic partial differential equations and their parallel implementation.
Proceedings of the Selected Papers from the Second Conference on Parallel Processing for Scientific Computing, 1985


  Loading...