John A. Gunnels

According to our database1, John A. Gunnels authored at least 63 papers between 1994 and 2017.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2017
Parallel Deep Neural Network Training for Big Data on Blue Gene/Q.
IEEE Trans. Parallel Distrib. Syst., 2017

Massively parallel first-principles simulation of electron dynamics in materials.
J. Parallel Distrib. Comput., 2017

2016
The BLIS Framework: Experiments in Portability.
ACM Trans. Math. Softw., 2016

An Early Performance Study of Large-Scale POWER8 SMP Systems.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Massively Parallel First-Principles Simulation of Electron Dynamics in Materials.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015
Active Memory Cube: A processing-in-memory architecture for exascale systems.
IBM Journal of Research and Development, 2015

Optimizing Sparse Linear Algebra for Large-Scale Graph Analytics.
IEEE Computer, 2015

Massively parallel models of the human circulatory system.
Proceedings of the International Conference for High Performance Computing, 2015

Scalable Community Detection with the Louvain Algorithm.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014
Parallel Deep Neural Network Training for Big Data on Blue Gene/Q.
Proceedings of the International Conference for High Performance Computing, 2014

Parallel deep neural network training for LVCSR tasks using blue gene/Q.
Proceedings of the INTERSPEECH 2014, 2014

2013
Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor.
IJHPCA, 2013

Design for low power and power management in IBM Blue Gene/Q.
IBM Journal of Research and Development, 2013

Trends and outlook for the massive-scale analytics stack.
IBM Journal of Research and Development, 2013

Science at LLNL with IBM Blue Gene/Q.
IBM Journal of Research and Development, 2013

Deriving dense linear algebra libraries.
Formal Asp. Comput., 2013

2012
Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene/P PowerPC 450 Processor
CoRR, 2012

Toward real-time modeling of human heart ventricles at cellular resolution: simulation of drug-induced arrhythmias.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

2011
PLAPACK.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Massive-Scale Analytics.
Proceedings of the Encyclopedia of Parallel Computing, 2011

2010
Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization.
Math. Program. Comput., 2010

Architecture of the Component Collective Messaging Interface.
IJHPCA, 2010

2009
Programming the Linpack benchmark for the IBM PowerXCell 8i processor.
Scientific Programming, 2009

Programming the Linpack benchmark for Roadrunner.
IBM Journal of Research and Development, 2009

Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Petascale computing with accelerators.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations.
Proceedings of the 23rd international conference on Supercomputing, 2009

MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

2008
BlueGene/L applications: Parallelism On a Massive Scale.
IJHPCA, 2008

Fine-grained parallelization of the Car - Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer.
IBM Journal of Research and Development, 2008

Optimization of BLAS on the Cell Processor.
Proceedings of the High Performance Computing, 2008

Optimization of Fast Fourier Transforms on the Blue Gene/L Supercomputer.
Proceedings of the High Performance Computing, 2008

2007
An experimental comparison of cache-oblivious and cache-conscious programs.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

2006
Gordon Bell finalists I - Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Gordon Bell finalists I - Large scale drop impact analysis of mobile phone using ADVC on Blue Gene/L.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Minimal Data Copy for Dense Linear Algebra Factorization.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Is Cache-Oblivious DGEMM Viable?
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

2005
The science of deriving dense linear algebra algorithms.
ACM Trans. Math. Softw., 2005

A fully portable high performance minimal storage hybrid format cholesky algorithm.
ACM Trans. Math. Softw., 2005

Blue Gene/L performance tools.
IBM Journal of Research and Development, 2005

Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L.
IBM Journal of Research and Development, 2005

Design and implementation of message-passing services for the Blue Gene/L supercomputer.
IBM Journal of Research and Development, 2005

Large-Scale First-Principles Molecular Dynamics simulations on the BlueGene/L Platform using the Qbox code.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Early Experience with Scientific Applications on the Blue Gene/L Supercomputer.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

2004
Unlocking the Performance of the BlueGene/L Supercomputer.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Architecture and Performance of the BlueGene/L Message Layer.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

A Family of High-Performance Matrix Multiplication Algorithms.
Proceedings of the Applied Parallel Computing, 2004

A New Array Format for Symmetric and Triangular Matrices.
Proceedings of the Applied Parallel Computing, 2004

Rapid Development of High-Performance Linear Algebra Libraries.
Proceedings of the Applied Parallel Computing, 2004

A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design.
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2002
An overview of the BlueGene/L Supercomputer.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

A Recursive Formulation of the Inversion of Symmetric Positive Definite Matrices in Packed Storage Data Format.
Proceedings of the Applied Parallel Computing Advanced Scientific Computing, 2002

2001
FLAME: Formal Linear Algebra Methods Environment.
ACM Trans. Math. Softw., 2001

A Family of High-Performance Matrix Multiplication Algorithms.
Proceedings of the Computational Science - ICCS 2001, 2001

Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice.
Proceedings of the 2001 International Conference on Dependable Systems and Networks (DSN 2001) (formerly: FTCS), 2001

2000
Formal Methods for High-Performance Linear Algebra Libraries.
Proceedings of the Architecture of Scientific Software, 2000

1998
A Flexible Class of Parallel Matrix Multiplication Algorithms.
IPPS/SPDP, 1998

PLAPACK: High Performance through High-Level Abstraction.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

1997
Parallel implementation of BLAS: general techniques for Level 3 BLAS.
Concurrency - Practice and Experience, 1997

PLAPACK Parallel Linear Algebra Package Design Overview.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

PLAPACK: Parallel Linear Algebra Package.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

1994
Genetic Algorithms and Simulated Annealing for Gene Mapping.
Proceedings of the First IEEE Conference on Evolutionary Computation, 1994


  Loading...