Fred G. Gustavson

According to our database1, Fred G. Gustavson authored at least 91 papers between 1970 and 2019.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 1996, "For contributions to the efficient numerical simulation and design of electrical circuits using innovation sparse matrix algorithms.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2019
Algorithms for in-place matrix transposition.
Concurr. Comput. Pract. Exp., 2019

2013
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms.
ACM Trans. Math. Softw., 2013

A Square Block Format for Symmetric Band Matrices.
Proceedings of the Parallel Processing and Applied Mathematics, 2013

Algebra and Geometry Combined Explains How the Mind Does Math.
Proceedings of the Parallel Processing and Applied Mathematics, 2013

2012
Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion.
ACM Trans. Math. Softw., 2012

2011
New Level-3 BLAS Kernels for Cholesky Factorization.
Proceedings of the Parallel Processing and Applied Mathematics, 2011

Cache Blocking for Linear Algebra Algorithms.
Proceedings of the Parallel Processing and Applied Mathematics, 2011

2010
Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion.
ACM Trans. Math. Softw., 2010

Cache Blocking.
Proceedings of the Applied Parallel and Scientific Computing, 2010

2009
Distributed SBP Cholesky factorization algorithms with near-optimal scheduling.
ACM Trans. Math. Softw., 2009

High Performance Computing with the Cell Broadband Engine.
Sci. Program., 2009

2007
Algorithm 865: Fortran 95 subroutines for Cholesky factorization in block hybrid format.
ACM Trans. Math. Softw., 2007

An experimental comparison of cache-oblivious and cache-conscious programs.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Three Versions of a Minimal Storage Cholesky Algorithm Using New Data Structures Gives High Performance Speeds as Verified on Many Computers.
Proceedings of the Parallel Processing and Applied Mathematics, 2007

The Relevance of New Data Structure Approaches for Dense Linear Algebra in the New Multi-Core / Many Core Environments.
Proceedings of the Parallel Processing and Applied Mathematics, 2007

2006
Rectangular Full Packed Format for LAPACK Algorithms Timings on Several Computers.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Novel Data Formats and Algorithms for Dense Linear Algebra Computations: Minisymposium Abstract.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

In-Place Transposition of Rectangular Matrices.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Three Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Minimal Data Copy for Dense Linear Algebra Factorization.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Is Cache-Oblivious DGEMM Viable?
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

2005
A fully portable high performance minimal storage hybrid format cholesky algorithm.
ACM Trans. Math. Softw., 2005

Custom math functions for molecular dynamics.
IBM J. Res. Dev., 2005

Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L.
IBM J. Res. Dev., 2005

2004
Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software.
SIAM Rev., 2004

High Performance Linear Algebra Algorithms: An Introduction.
Proceedings of the Applied Parallel Computing, 2004

New Generalized Data Structures for Matrices Lead to a Variety of High Performance Dense Linear Algebra Algorithms.
Proceedings of the Applied Parallel Computing, 2004

A Family of High-Performance Matrix Multiplication Algorithms.
Proceedings of the Applied Parallel Computing, 2004

A New Array Format for Symmetric and Triangular Matrices.
Proceedings of the Applied Parallel Computing, 2004

Rapid Development of High-Performance Linear Algebra Libraries.
Proceedings of the Applied Parallel Computing, 2004

A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design.
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003
High-performance linear algebra algorithms using new generalized data structures for matrices.
IBM J. Res. Dev., 2003

2002
Fast pseudorandom-number generators with modulus 2k or 2k - 1 using fused multiply-ad.
IBM J. Res. Dev., 2002

An overview of the BlueGene/L Supercomputer.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

A Recursive Formulation of the Inversion of Symmetric Positive Definite Matrices in Packed Storage Data Format.
Proceedings of the Applied Parallel Computing Advanced Scientific Computing, 2002


2001
FLAME: Formal Linear Algebra Methods Environment.
ACM Trans. Math. Softw., 2001

A recursive formulation of Cholesky factorization of a matrix in packed storage.
ACM Trans. Math. Softw., 2001

New Generalized Data Structures for Matrices Lead to a Variety of High Performance Algorithms.
Proceedings of the Parallel Processing and Applied Mathematics, 2001

2000
Minimal-storage high-performance Cholesky factorization via blocking and recursion.
IBM J. Res. Dev., 2000

Applying recursion to serial and parallel QR factorization leads to better performance.
IBM J. Res. Dev., 2000

A Fast Minimal Storage Symmetric Indefinite Solver.
Proceedings of the Applied Parallel Computing, 2000

High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage.
Proceedings of the Applied Parallel Computing, 2000

High-Performance Library Software for QR Factorization.
Proceedings of the Applied Parallel Computing, 2000

LAWRA: Linear Algebra with Recursive Algorithms.
Proceedings of the Applied Parallel Computing, 2000

Inversion of Symmetric Matrices in a New Block Packes Storage.
Proceedings of the Numerical Analysis and Its Applications, 2000

Design and evaluation of a linear algebra package for Java.
Proceedings of the ACM 2000 Java Grande Conference, San Francisco, CA, USA, 2000

New Generalized Matrix Data Structures Lead to a Variety of High-Performance Algorithms.
Proceedings of the Architecture of Scientific Software, 2000

LAWRA Workshop: Linear Algebra with Recursive Algorithms: http://lawra.uni-c.dk/lawra/.
Proceedings of the High-Performance Computing and Networking, 8th International Conference, 2000

1999
PSPASES: An Efficient and Scalable Parallel Sparse Direct Solver.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Towards Peak Performance on Hierarchical SMP Memory Architectures - New Recursive Blocked Data Formats and BLAS.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Recursive Formulation of Some Dense Linear Algebra Algorithms.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

A Recursive Formulation of the Cholesky Factorization Operating on a Matrix in Packed Storage Form.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

A Columnwise Recursive Perturbation Based Algorithm for Symmetric Indefinite Linear Systems.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

Experience with a Recursive Perturbation Based Algorithm for Symmetric Indefinite Linear Systems.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

The fused multiply-add instruction leads to algorithms for extended-precision floating point: applications to java and high-performance computing.
Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative Research, 1999

Series Approximation Methods for Divide and Square Root in the Power3(TM) Processor.
Proceedings of the 14th IEEE Symposium on Computer Arithmetic (Arith-14 '99), 1999

1998
The Design, Implementation, and Evaluation of a Symmetric Banded Linear Solver for Distributed-Memory Parallel Computers.
ACM Trans. Math. Softw., 1998

Recursive Formulation of Cholesky Algorithm in Fortran 90.
Proceedings of the Applied Parallel Computing, 1998

Superscalar GEMM-based Level 3 BLAS - The On-going Evolution of a Portable and High-Performance Library.
Proceedings of the Applied Parallel Computing, 1998

Recursive Blocked Data Formats and BLAS's for Dense Linear Algebra Algorithms.
Proceedings of the Applied Parallel Computing, 1998

New Serial and Parallel Recursive <i>QR</i> Factorization Algorithms for SMP Systems.
Proceedings of the Applied Parallel Computing, 1998

1997
Recursion leads to automatic variable blocking for dense linear-algebra algorithms.
IBM J. Res. Dev., 1997

Design and Implementation of a Scalable Parallel Direct Solver for Sparse Symmetric Positive Definite Systems: Preliminary Results.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

1996
A New Parallel Algorithm for Tridiagonal Symmetric Positive Definite Systems of Equations.
Proceedings of the Applied Parallel Computing, 1996

The Design, Implementation, and Evaluation of a Banded Linear Solver for Distributed-Memory Parallel Computers.
Proceedings of the Applied Parallel Computing, 1996

Fast Graph Partitioning and Its Application in Sparse Matrix Ordering.
Proceedings of the Applied Parallel Computing, 1996

Performance Tuning IBM RS/6000 POWER2 Systems.
Proceedings of the Applied Parallel Computing, 1996

The Design and Implementation of SOLAR, a Portable Library for Scalable Out-of-core Linear Algebra Computations.
Proceedings of the Fourth Workshop on I/O in Parallel and Distributed Systems, 1996

1995
High-Performance Parallel Implementations of the NAS Kernel Benchmarks on the IBM SP2.
IBM Syst. J., 1995

A three-dimensional approach to parallel matrix multiplication.
IBM J. Res. Dev., 1995

A Scalable Parallel Block Algorithm for Band Cholesky Factorization.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

A High Performance Matrix Manipulation Algorithm for MPPs.
Proceedings of the Applied Parallel Computing, 1995

1994
A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication.
IBM J. Res. Dev., 1994

Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms.
IBM J. Res. Dev., 1994

Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch.
IBM J. Res. Dev., 1994

A high performance parallel algorithm for 1-D FFT.
Proceedings of the Proceedings Supercomputing '94, 1994

A Very High-Performance Algorithm for NAS EP Benchmark.
Proceedings of the High-Performance Computing and Networking, 1994

1992
A High Performance Algorithm Using Pre-Processing for the Sparse Matrix-Vector Multiplication.
Proceedings of the Proceedings Supercomputing '92, 1992

1989
Engineering and Scientific Subroutine Library Release 3 for IBM ES/3090 Vector Multiprocessors.
IBM Syst. J., 1989

Vector and parallel algorithms for Cholesky factorization on IBM 3090.
Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

1986
New Scalar and Vector Elementary Functions for the IBM System/370.
IBM J. Res. Dev., 1986

1985
Fast Elementary Function Algorithms for 370 Machines (Abstract).
Proceedings of the Accurate Scientific Computations, 1985

1980
Fast Solution of Toeplitz Systems of Equations and Computation of Padé Approximants.
J. Algorithms, 1980

1979
Fast computation of rational Hermite interpolants and solving Toeplitz system of equations via the extended Euclidean algorithm.
Proceedings of the Symbolic and Algebraic Computation, 1979

1978
Remark on "Algorithm 408: A Sparse Matrix Package (Part I) [F4]".
ACM Trans. Math. Softw., 1978

Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition.
ACM Trans. Math. Softw., 1978

1976
Analysis of the Berlekamp-Massey Linear Feedback Shift-Register Synthesis Algorithm.
IBM J. Res. Dev., 1976

Arithmetic complexity of unordered sparse polynomials.
Proceedings of the third ACM Symposium on Symbolic and Algebraic Manipulation, 1976

1970
Symbolic Generation of an Optimal Crout Algorithm for Sparse Systems of Linear Equations.
J. ACM, 1970

A fast random number generator with good statistical properties.
Computing, 1970


  Loading...