We stand with Ukraine

We stand with Ukraine

John A. Gunnels

Orcid: 0000-0001-5110-190X

According to our database¹, John A. Gunnels authored at least 69 papers between 1994 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Exceeding the Numerical and Performance Characteristics of IEEE-754 SGEMM with BFloat16 Tensor Cores on GPUs for Scientific Computing.

[DOI]

Harun Bayraktar

,

,

John A. Gunnels

,

,

,

,

Dmitry I. Lyakh

,

,

Victor Podlozhnyuk

,

Addison Richards

,

,

CoRR, May, 2026

Guaranteed DGEMM Accuracy While Using Reduced Precision Tensor Cores Through Extensions of the Ozaki Scheme.

[DOI]

Angelika Schwarz

,

,

,

Harun Bayraktar

,

John A. Gunnels

,

,

,

Samuel Rodriguez

,

Sébastien Cayrols

,

Pawel Tabaszewski

,

Victor Podlozhnyuk

Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region, 2026

2025

Accelerating Supercomputing: AI-Hardware-Driven Innovation for Speed and Efficiency.

[DOI]

Jack J. Dongarra

,

John A. Gunnels

,

Harun Bayraktar

,

,

Proceedings of the IEEE High Performance Extreme Computing Conference, 2025

2024

Hardware Trends Impacting Floating-Point Computations In Scientific Applications.

[DOI]

Jack J. Dongarra

,

John A. Gunnels

,

Harun Bayraktar

,

,

CoRR, 2024

2023

cuQuantum SDK: A High-Performance Library for Accelerating Quantum Science.

[DOI]

Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

2020

Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19.

[DOI]

,

,

Matthew B. Baker

,

Jérôme Baudry

,

Debsindhu Bhowmik

,

,

Kendall G. Byler

,

Sam Yen-Chi Chen

,

Leighton Coates

,

Connor J. Cooper

,

,

Isabella Daidone

,

,

Sally R. Ellingson

,

,

,

James C. Gumbart

,

John A. Gunnels

,

Oscar R. Hernandez

,

,

Daniel W. Kneller

,

Andrey Kovalevsky

,

Jeffrey M. Larkin

,

Travis J. Lawrence

,

,

,

Julie C. Mitchell

,

,

,

,

Loukas Petridis

,

,

,

Arvind Ramanathan

,

David M. Rogers

,

Diogo Santos-Martins

,

Aaron Scheinberg

,

,

,

Jeremy C. Smith

,

Micholas Dean Smith

,

,

Aristides Tsaris

,

Mathialakan Thavappiragasam

,

Andreas F. Tillack

,

Josh Vincent Vermaas

,

,

,

,

,

Laura Zanetti Polzi

J. Chem. Inf. Model., 2020

2019

Preparation and optimization of a diverse workload for a large-scale heterogeneous system.

[DOI]

,

,

Bronis R. de Supinski

,

,

,

David Beckingsale

,

,

,

,

Carlos H. A. Costa

,

,

Giacomo Domeniconi

,

,

,

Sara Kokkila Schumacher

,

Steven H. Langer

,

,

,

,

,

David F. Richards

,

Björn Sjögreen

,

,

Carol S. Woodward

,

Ulrike Meier Yang

,

,

,

David Appelhans

,

,

Peter D. Barnes Jr.

,

,

,

Jamie A. Bramwell

,

,

José R. Brunheroto

,

,

Charway R. Cooper

,

,

Robert D. Falgout

,

,

David J. Gardner

,

James N. Glosli

,

John A. Gunnels

,

,

Tzanio V. Kolev

,

,

Matthew P. LeGendre

,

,

,

Shelby Lockhart

,

Kathleen McCandless

,

,

Jaime H. Moreno

,

,

,

Rao Nimmakayala

,

Kathryn M. O'Brien

,

,

Ramesh Pankajakshan

,

,

,

,

Steven C. Rennich

,

,

,

James C. Sexton

,

,

,

Guillaume Thomas-Collignon

,

Brian Van Essen

,

,

,

,

,

,

Daniel A. White

,

Christopher Young

,

,

Proceedings of the International Conference for High Performance Computing, 2019

2017

A knowledge and reasoning toolkit for cognitive applications.

[DOI]

,

Cristina Cornelio

,

,

,

Kyle Yingkai Gao

,

John A. Gunnels

,

,

,

Mariano Rodriguez-Muro

,

Rosario Uceda-Sosa

Proceedings of the fifth ACM/IEEE Workshop on Hot Topics in Web Systems and Technologies, 2017

2016

The BLIS Framework: Experiments in Portability.

[DOI]

Field G. Van Zee

,

,

,

,

Robert A. van de Geijn

,

Francisco D. Igual

,

Mikhail Smelyanskiy

,

,

Michael Kistler

,

,

John A. Gunnels

,

ACM Trans. Math. Softw., 2016

An Early Performance Study of Large-Scale POWER8 SMP Systems.

[DOI]

,

,

,

,

,

Fabrizio Petrini

,

John A. Gunnels

,

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Massively Parallel First-Principles Simulation of Electron Dynamics in Materials.

[DOI]

Erik W. Draeger

,

,

John A. Gunnels

,

Abhinav Bhatele

,

,

Alfredo A. Correa

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015

Active Memory Cube: A processing-in-memory architecture for exascale systems.

[DOI]

IBM J. Res. Dev., 2015

Optimizing Sparse Linear Algebra for Large-Scale Graph Analytics.

[DOI]

,

John A. Gunnels

,

,

,

Fabrizio Petrini

,

,

Computer, 2015

Massively parallel models of the human circulatory system.

[DOI]

,

Erik W. Draeger

,

Tomas Oppelstrup

,

,

John A. Gunnels

Proceedings of the International Conference for High Performance Computing, 2015

Scalable Community Detection with the Louvain Algorithm.

[DOI]

,

,

Fabrizio Petrini

,

John A. Gunnels

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014

Parallel Deep Neural Network Training for Big Data on Blue Gene/Q.

[DOI]

,

Tara N. Sainath

,

Bhuvana Ramabhadran

,

Michael Picheny

,

John A. Gunnels

,

,

Upendra V. Chaudhari

,

Brian Kingsbury

Proceedings of the International Conference for High Performance Computing, 2014

Parallel deep neural network training for LVCSR tasks using blue gene/Q.

[DOI]

Tara N. Sainath

,

,

Bhuvana Ramabhadran

,

Michael Picheny

,

John A. Gunnels

,

Brian Kingsbury

,

,

,

Upendra V. Chaudhari

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor.

[DOI]

,

Aron J. Ahmadia

,

,

John A. Gunnels

,

Int. J. High Perform. Comput. Appl., 2013

Design for low power and power management in IBM Blue Gene/Q.

[DOI]

Krishnan Sugavanam

,

,

John A. Gunnels

,

,

Philip Heidelberger

,

Hans M. Jacobson

,

Moyra K. McManus

,

,

David L. Satterfield

,

Yutaka Sugawara

,

IBM J. Res. Dev., 2013

Trends and outlook for the massive-scale analytics stack.

[DOI]

,

John A. Gunnels

,

Prabhanjan Kambadur

,

Edwin P. D. Pednault

,

Mark S. Squillante

IBM J. Res. Dev., 2013

Science at LLNL with IBM Blue Gene/Q.

[DOI]

IBM J. Res. Dev., 2013

Deriving dense linear algebra libraries.

[DOI]

Paolo Bientinesi

,

John A. Gunnels

,

Margaret E. Myers

,

Enrique S. Quintana-Ortí

,

,

Robert A. van de Geijn

,

Field G. Van Zee

Formal Aspects Comput., 2013

2012

Toward real-time modeling of human heart ventricles at cellular resolution: simulation of drug-induced arrhythmias.

[DOI]

Arthur A. Mirin

,

David F. Richards

,

James N. Glosli

,

Erik W. Draeger

,

,

Jean-Luc Fattebert

,

William D. Krauss

,

Tomas Oppelstrup

,

John Jeremy Rice

,

John A. Gunnels

,

Viatcheslav Gurev

,

,

,

Matthias Reumann

,

Proceedings of the SC Conference on High Performance Computing Networking, 2012

2011

PLAPACK.

[DOI]

John A. Gunnels

Proceedings of the Encyclopedia of Parallel Computing, 2011

Massive-Scale Analytics.

[DOI]

,

John A. Gunnels

,

Mark S. Squillante

Proceedings of the Encyclopedia of Parallel Computing, 2011

2010

Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization.

[DOI]

John A. Gunnels

,

,

Susan Margulies

Math. Program. Comput., 2010

Architecture of the Component Collective Messaging Interface.

[DOI]

,

,

Amith R. Mamidala

,

,

,

,

John A. Gunnels

,

,

Joseph Ratterman

,

Philip Heidelberger

Int. J. High Perform. Comput. Appl., 2010

2009

Programming the Linpack benchmark for the IBM PowerXCell 8i processor.

[DOI]

Michael Kistler

,

John A. Gunnels

,

Daniel A. Brokenshire

,

Sci. Program., 2009

Programming the Linpack benchmark for Roadrunner.

[DOI]

Michael Kistler

,

John A. Gunnels

,

Daniel A. Brokenshire

,

IBM J. Res. Dev., 2009

Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems.

[DOI]

David F. Richards

,

James N. Glosli

,

,

,

Erik W. Draeger

,

Jean-Luc Fattebert

,

William D. Krauss

,

Thomas E. Spelce

,

Frederick H. Streitz

,

Michael P. Surh

,

John A. Gunnels

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Petascale computing with accelerators.

[DOI]

Michael Kistler

,

John A. Gunnels

,

Daniel A. Brokenshire

,

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations.

[DOI]

,

,

,

Amith R. Mamidala

,

John A. Gunnels

,

Philip Heidelberger

Proceedings of the 23rd international conference on Supercomputing, 2009

MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations.

[DOI]

,

,

,

Amith R. Mamidala

,

John A. Gunnels

Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

2008

BlueGene/L applications: Parallelism On a Massive Scale.

[DOI]

Int. J. High Perform. Comput. Appl., 2008

Fine-grained parallelization of the Car - Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer.

[DOI]

,

Abhinav Bhatele

,

Laxmikant V. Kalé

,

Mark E. Tuckerman

,

,

John A. Gunnels

,

Glenn J. Martyna

IBM J. Res. Dev., 2008

Optimization of BLAS on the Cell Processor.

[DOI]

,

Prashant Agrawal

,

Yogish Sabharwal

,

,

Vimitha A. Kuruvilla

,

John A. Gunnels

Proceedings of the High Performance Computing, 2008

Optimization of Fast Fourier Transforms on the Blue Gene/L Supercomputer.

[DOI]

Yogish Sabharwal

,

Saurabh Kumar Garg

,

,

John A. Gunnels

,

Ramendra K. Sahoo

Proceedings of the High Performance Computing, 2008

2007

An experimental comparison of cache-oblivious and cache-conscious programs.

[DOI]

,

,

,

John A. Gunnels

,

Fred G. Gustavson

Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability.

[DOI]

James N. Glosli

,

David F. Richards

,

K. J. Caspersen

,

,

John A. Gunnels

,

Frederick H. Streitz

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

2006

Gordon Bell finalists I - Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform.

[DOI]

,

Erik W. Draeger

,

,

Bronis R. de Supinski

,

John A. Gunnels

,

,

James C. Sexton

,

Franz Franchetti

,

,

Christoph W. Ueberhuber

,

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Gordon Bell finalists I - Large scale drop impact analysis of mobile phone using ADVC on Blue Gene/L.

[DOI]

,

Tomonobu Ohyama

,

Yoshinoir Shibata

,

,

Yoshikazu Katai

,

Ryuichi Takeuchi

,

Takeshi Hoshino

,

Shinobu Yoshimura

,

Hirohisa Noguchi

,

,

John A. Gunnels

,

,

Yogish Sabharwal

,

,

,

Takashi Kawakami

,

Satoru Todokoro

,

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Minimal Data Copy for Dense Linear Algebra Factorization.

[DOI]

Fred G. Gustavson

,

John A. Gunnels

,

James C. Sexton

Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Is Cache-Oblivious DGEMM Viable?

[DOI]

John A. Gunnels

,

Fred G. Gustavson

,

,

Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Achieving High Performance on the BlueGene/L Supercomputer.

[DOI]

George S. Almási

,

,

Siddhartha Chatterjee

,

,

John A. Gunnels

,

,

,

José E. Moreira

,

James C. Sexton

,

,

Alessandro Curioni

,

,

Leonardo R. Bachega

,

,

,

,

Giri Chukkapalli

,

Robert Harkness

,

Proceedings of the Parallel Processing for Scientific Computing, 2006

2005

The science of deriving dense linear algebra algorithms.

[DOI]

Paolo Bientinesi

,

John A. Gunnels

,

Margaret E. Myers

,

Enrique S. Quintana-Ortí

,

Robert A. van de Geijn

ACM Trans. Math. Softw., 2005

A fully portable high performance minimal storage hybrid format cholesky algorithm.

[DOI]

Bjarne Stig Andersen

,

John A. Gunnels

,

Fred G. Gustavson

,

,

Jerzy Wasniewski

ACM Trans. Math. Softw., 2005

Blue Gene/L performance tools.

[DOI]

Xavier Martorell

,

,

,

José R. Brunheroto

,

,

John A. Gunnels

,

,

,

Francesc Escalé

,

,

,

José E. Moreira

IBM J. Res. Dev., 2005

Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L.

[DOI]

Siddhartha Chatterjee

,

Leonardo R. Bachega

,

,

Kenneth A. Dockser

,

John A. Gunnels

,

,

Fred G. Gustavson

,

Christopher A. Lapkowski

,

,

Mark P. Mendell

,

,

Charles D. Wait

,

T. J. Christopher Ward

,

IBM J. Res. Dev., 2005

Design and implementation of message-passing services for the Blue Gene/L supercomputer.

[DOI]

,

,

José G. Castaños

,

John A. Gunnels

,

C. Christopher Erway

,

Philip Heidelberger

,

Xavier Martorell

,

José E. Moreira

,

,

,

Burkhard D. Steinmacher-Burow

,

,

Brian R. Toonen

IBM J. Res. Dev., 2005

Large-Scale First-Principles Molecular Dynamics simulations on the BlueGene/L Platform using the Qbox code.

[DOI]

,

Robert K. Yates

,

,

Erik W. Draeger

,

Franz Franchetti

,

Christoph W. Ueberhuber

,

Bronis R. de Supinski

,

,

John A. Gunnels

,

James C. Sexton

Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Early Experience with Scientific Applications on the Blue Gene/L Supercomputer.

[DOI]

Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

2004

Unlocking the Performance of the BlueGene/L Supercomputer.

[DOI]

,

Siddhartha Chatterjee

,

,

John A. Gunnels

,

,

,

José E. Moreira

,

Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Architecture and Performance of the BlueGene/L Message Layer.

[DOI]

,

,

John A. Gunnels

,

Philip Heidelberger

,

Xavier Martorell

,

José E. Moreira

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

A Family of High-Performance Matrix Multiplication Algorithms.

[DOI]

John A. Gunnels

,

Fred G. Gustavson

,

,

Robert A. van de Geijn

Proceedings of the Applied Parallel Computing, 2004

A New Array Format for Symmetric and Triangular Matrices.

[DOI]

John A. Gunnels

,

Fred G. Gustavson

Proceedings of the Applied Parallel Computing, 2004

Rapid Development of High-Performance Linear Algebra Libraries.

[DOI]

Paolo Bientinesi

,

John A. Gunnels

,

Fred G. Gustavson

,

,

Margaret E. Myers

,

Enrique S. Quintana-Ortí

,

Robert A. van de Geijn

Proceedings of the Applied Parallel Computing, 2004

A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design.

[DOI]

Leonardo R. Bachega

,

Siddhartha Chatterjee

,

Kenneth A. Dockser

,

John A. Gunnels

,

,

Fred G. Gustavson

,

Christopher A. Lapkowski

,

,

Mark P. Mendell

,

Charles D. Wait

,

T. J. Christopher Ward

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2002

An overview of the BlueGene/L Supercomputer.

[DOI]

Narasimha R. Adiga

,

,

George S. Almási

,

,

Rajkishore Barik

,

Daniel K. Beece

,

Ralph Bellofatto

,

,

,

Matthias A. Blumrich

,

Arthur A. Bright

,

José R. Brunheroto

,

,

José G. Castaños

,

,

,

,

Siddhartha Chatterjee

,

,

George L.-T. Chiu

,

Thomas M. Cipolla

,

,

,

,

,

Marc Boris Dombrowa

,

,

Maria Eleftheriou

,

C. Christopher Erway

,

,

,

Joseph Gagliano

,

,

,

Robert S. Germain

,

,

Balaji Gopalsamy

,

John A. Gunnels

,

,

Fred G. Gustavson

,

,

,

David F. Heidel

,

Philip Heidelberger

,

Lorraine Herger

,

,

,

T. Jamal-Eddine

,

Gerard V. Kopcsay

,

,

Manish P. Kurhekar

,

Alphonso P. Lanzetta

,

,

,

,

Mark P. Mendell

,

,

,

Lawrence S. Mok

,

José E. Moreira

,

Ben J. Nathanson

,

,

,

,

Vinayaka Pandit

,

,

,

Richard D. Regan

,

,

Albert E. Ruehli

,

Silvius Vasile Rus

,

Ramendra K. Sahoo

,

,

Eugen Schenfeld

,

,

,

Sarabjeet Singh

,

,

Vijay Srinivasan

,

Burkhard D. Steinmacher-Burow

,

,

Christopher W. Surovic

,

Richard A. Swetz

,

,

R. Brett Tremaine

,

,

Arun R. Umamaheshwaran

,

,

,

T. J. Christopher Ward

,

Michael E. Wazlowski

,

,

,

,

,

,

,

David J. Krolak

,

,

Thomas A. Liebsch

,

James A. Marcella

,

,

,

,

,

,

,

Charles D. Wait

,

,

,

Kenneth A. Dockser

,

,

,

Jeffrey S. Vetter

,

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

A Recursive Formulation of the Inversion of Symmetric Positive Definite Matrices in Packed Storage Data Format.

[DOI]

Bjarne Stig Andersen

,

John A. Gunnels

,

Fred G. Gustavson

,

Jerzy Wasniewski

Proceedings of the Applied Parallel Computing Advanced Scientific Computing, 2002

2001

FLAME: Formal Linear Algebra Methods Environment.

[DOI]

John A. Gunnels

,

Fred G. Gustavson

,

,

Robert A. van de Geijn

ACM Trans. Math. Softw., 2001

A Family of High-Performance Matrix Multiplication Algorithms.

[DOI]

John A. Gunnels

,

,

Robert A. van de Geijn

Proceedings of the Computational Science - ICCS 2001, 2001

Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice.

[DOI]

John A. Gunnels

,

Robert A. van de Geijn

,

,

Enrique S. Quintana-Ortí

Proceedings of the 2001 International Conference on Dependable Systems and Networks (DSN 2001) (formerly: FTCS), 2001

2000

Formal Methods for High-Performance Linear Algebra Libraries.

John A. Gunnels

,

Robert A. van de Geijn

Proceedings of the Architecture of Scientific Software, 2000

1998

A Flexible Class of Parallel Matrix Multiplication Algorithms.

[DOI]

John A. Gunnels

,

,

,

Robert A. van de Geijn

Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

PLAPACK: High Performance through High-Level Abstraction.

[DOI]

Gregory S. Baker

,

John A. Gunnels

,

,

Beatrice Riviere

,

Robert A. van de Geijn

Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

1997

Parallel implementation of BLAS: general techniques for Level 3 BLAS.

[DOI]

Almadena Yu. Chtchelkanova

,

John A. Gunnels

,

,

,

Robert A. van de Geijn

Concurr. Pract. Exp., 1997

PLAPACK Parallel Linear Algebra Package Design Overview.

[DOI]

Phillip Alpatov

,

Gregory S. Baker

,

H. Carter Edwards

,

John A. Gunnels

,

,

,

Robert A. van de Geijn

Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

PLAPACK: Parallel Linear Algebra Package.

Phillip Alpatov

,

Gregory S. Baker

,

H. Carter Edwards

,

John A. Gunnels

,

,

,

Robert A. van de Geijn

,

Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

1994

Genetic Algorithms and Simulated Annealing for Gene Mapping.

[DOI]

John A. Gunnels

,

,

James L. Holloway

Proceedings of the First IEEE Conference on Evolutionary Computation, 1994

Loading...