We stand with Ukraine

We stand with Ukraine

Keith D. Underwood

Orcid: 0009-0001-0078-9959

According to our database¹, Keith D. Underwood authored at least 80 papers between 1998 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

In-Network Collective Operations: Game Changer or Challenge for AI Workloads?

[DOI]

Torsten Hoefler

,

Mikhail Khalilov

,

,

Surendra Anubolu

,

,

,

,

,

Keith D. Underwood

,

Adrian M. Caulfield

,

,

Amirreza Rastegari

Computer, January, 2026

2025

Ultra Ethernet's Design Principles and Architectural Innovations.

[DOI]

Torsten Hoefler

,

,

,

Keith D. Underwood

,

Cedell Alexander

,

,

,

Adrian M. Caulfield

,

,

,

,

,

Eugene Opsasnick

,

,

,

CoRR, August, 2025

2023

Datacenter Ethernet and RDMA: Issues at Hyperscale.

[DOI]

Torsten Hoefler

,

,

Keith D. Underwood

,

,

,

Vahid Tabatabaee

,

,

Surendra Anubolu

,

,

,

,

CoRR, 2023

Not all applications have boring communication patterns: Profiling message matching with BMM.

[DOI]

Taylor L. Groves

,

Naveen Ravichandrasekaran

,

,

,

David Trebotich

,

Nicholas J. Wright

,

,

,

Keith D. Underwood

Concurr. Comput. Pract. Exp., 2023

Data Center Ethernet and Remote Direct Memory Access: Issues at Hyperscale.

[DOI]

Torsten Hoefler

,

,

Keith D. Underwood

,

Robert Alverson

,

,

Vahid Tabatabaee

,

,

Surendra Anubolu

,

,

,

,

Computer, 2023

2017

Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches.

[DOI]

,

,

,

Keith D. Underwood

,

Torsten Hoefler

Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017

2016

Enabling Scalable High-Performance Systems with the Intel Omni-Path Architecture.

[DOI]

Mark S. Birrittella

,

,

,

,

,

,

Keith D. Underwood

,

IEEE Micro, 2016

Mitigating MPI Message Matching Misery.

[DOI]

,

,

Keith D. Underwood

Proceedings of the High Performance Computing - 31st International Conference, 2016

2015

Remote Memory Access Programming in MPI-3.

[DOI]

Torsten Hoefler

,

,

,

,

,

,

Keith D. Underwood

ACM Trans. Parallel Comput., 2015

Exploiting Offload Enabled Network Interfaces.

[DOI]

Salvatore Di Girolamo

,

,

Keith D. Underwood

,

Torsten Hoefler

Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics.

[DOI]

Mark S. Birrittella

,

,

,

,

,

,

Keith D. Underwood

,

Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

2014

Reducing Synchronization Overhead Through Bundled Communication.

[DOI]

,

,

,

,

Keith D. Underwood

,

Robert W. Wisniewski

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

2013

Evaluating on-die interconnects for a 4 TB/s router.

[DOI]

Keith D. Underwood

,

,

,

Timothy Stremcha

,

Proceedings of the International Conference on Supercomputing, 2013

2012

Poster: Portals 4 Network Programming Interface.

[DOI]

,

,

Keith D. Underwood

,

K. Scott Hemmert

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

A Low Impact Flow Control Implementation for Offload Communication Interfaces.

[DOI]

Brian W. Barrett

,

,

Keith D. Underwood

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Exploiting communication and packaging locality for cost-effective large scale networks.

[DOI]

Keith D. Underwood

,

Proceedings of the International Conference on Supercomputing, 2012

2011

Scientific Application Demands on a Reconfigurable Functional Unit Interface.

[DOI]

,

Keith D. Underwood

,

Katherine Compton

ACM Trans. Reconfigurable Technol. Syst., 2011

Using Triggered Operations to Offload Rendezvous Messages.

[DOI]

Brian W. Barrett

,

,

K. Scott Hemmert

,

Kyle B. Wheeler

,

Keith D. Underwood

Proceedings of the Recent Advances in the Message Passing Interface, 2011

A Unified Algorithm for Both Randomized Deterministic and Adaptive Routing in Torus Networks.

[DOI]

Keith D. Underwood

,

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Enabling Flexible Collective Communication Offload with Triggered Operations.

[DOI]

Keith D. Underwood

,

,

,

K. Scott Hemmert

,

Brian W. Barrett

,

,

Michael J. Levenhagen

Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

Enhanced Support for OpenSHMEM Communication in Portals.

[DOI]

Brian W. Barrett

,

,

K. Scott Hemmert

,

Kevin T. Pedretti

,

Kyle B. Wheeler

,

Keith D. Underwood

Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

2010

Fast, Efficient Floating-Point Adders and Multipliers for FPGAs.

[DOI]

K. Scott Hemmert

,

Keith D. Underwood

ACM Trans. Reconfigurable Technol. Syst., 2010

Performance evaluation of the Red Storm dual-core upgrade.

[DOI]

,

Keith D. Underwood

,

Courtenay T. Vaughan

,

Concurr. Comput. Pract. Exp., 2010

Using Triggered Operations to Offload Collective Communication Operations.

[DOI]

K. Scott Hemmert

,

Brian W. Barrett

,

Keith D. Underwood

Proceedings of the Recent Advances in the Message Passing Interface, 2010

Challenges for High-Performance Networking for Exascale Computing.

[DOI]

,

Brian W. Barrett

,

Karl S. Hemmert

,

Keith D. Underwood

Proceedings of the 19th International Conference on Computer Communications and Networks, 2010

2009

From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing.

[DOI]

Keith D. Underwood

,

K. Scott Hemmert

,

ACM Trans. Reconfigurable Technol. Syst., 2009

2008

Architectural Modifications to Enhance the Floating-Point Performance of FPGAs.

[DOI]

Michael J. Beauchamp

,

,

Keith D. Underwood

,

K. Scott Hemmert

IEEE Trans. Very Large Scale Integr. Syst., 2008

High message rate, NIC-based atomics: Design and performance considerations.

[DOI]

Keith D. Underwood

,

Michael J. Levenhagen

,

K. Scott Hemmert

,

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing.

[DOI]

Keith D. Underwood

Proceedings of the Reconfigurable Computing: Architectures, 2008

2007

Floating-Point Divider Design for FPGAs.

[DOI]

K. Scott Hemmert

,

Keith D. Underwood

IEEE Trans. Very Large Scale Integr. Syst., 2007

Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors.

[DOI]

Keith D. Underwood

,

Michael J. Levenhagen

,

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Analyzing the Scalability of Graph Algorithms on Eldorado.

[DOI]

Keith D. Underwood

,

,

Jonathan W. Berry

,

Bruce Hendrickson

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Simulating Red Storm: Challenges and Successes in Building a System Simulation.

[DOI]

Keith D. Underwood

,

Michael J. Levenhagen

,

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Scientific Application Acceleration with Reconfigurable Functional Units.

[DOI]

,

Keith D. Underwood

,

Katherine Compton

Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2007

An architecture to perform NIC based MPI matching.

[DOI]

K. Scott Hemmert

,

Keith D. Underwood

,

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006

SeaStar Interconnect: Balanced Bandwidth for Scalable Performance.

[DOI]

,

Kevin T. Pedretti

,

Keith D. Underwood

,

Trammell Hudson

IEEE Micro, 2006

Implications of application usage characteristics for collective communication offload.

[DOI]

,

,

,

Keith D. Underwood

Int. J. High Perform. Comput. Netw., 2006

Tools and techniques for performance - Architectures and APIs: assessing requirements for delivering FPGA performance to applications.

[DOI]

Keith D. Underwood

,

K. Scott Hemmert

,

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Poster reception - The structural simulation toolkit: exploring novel architectures.

[DOI]

,

Richard C. Murphy

,

,

Keith D. Underwood

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Reconfigurable supercomputing - Is high-performance reconfigurable computing the next supercomputing paradigm?

[DOI]

Tarek A. El-Ghazawi

,

,

Daniel S. Poznanovic

,

,

Keith D. Underwood

,

,

Duncan A. Buell

,

,

Volodymyr V. Kindratenko

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Challenges and Issues in Benchmarking MPI.

[DOI]

Keith D. Underwood

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

A preliminary analysis of the InfiniPath and XD1 network interfaces.

[DOI]

,

Douglas Doerfler

,

Keith D. Underwood

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Scientific applications vs. SPEC-FP: a comparison of program behavior.

[DOI]

,

,

Keith D. Underwood

,

Katherine Compton

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Architectural Modifications to Improve Floating-Point Unit Efficiency in FPGAs.

[DOI]

Michael J. Beauchamp

,

,

Keith D. Underwood

,

K. Scott Hemmert

Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL), 2006

Embedded floating-point units in FPGAs.

[DOI]

Michael J. Beauchamp

,

,

Keith D. Underwood

,

K. Scott Hemmert

Proceedings of the ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, 2006

Open Source High Performance Floating-Point Modules.

[DOI]

K. Scott Hemmert

,

Keith D. Underwood

Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006

Fine-Grained Message Pipelining for Improved MPI Performance.

[DOI]

,

Kyle B. Wheeler

,

,

Keith D. Underwood

Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark.

[DOI]

Steven J. Plimpton

,

,

Courtenay T. Vaughan

,

Keith D. Underwood

,

Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

2005

Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications.

[DOI]

,

,

Keith D. Underwood

Int. J. High Perform. Comput. Appl., 2005

A Hardware Acceleration Unit for MPI Queue Processing.

[DOI]

Keith D. Underwood

,

K. Scott Hemmert

,

,

Richard C. Murphy

,

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Enhancing NIC Performance for MPI using Processing-in-Memory.

[DOI]

,

Richard C. Murphy

,

,

Keith D. Underwood

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

RC-BLAST: Towards a Portable, Cost-Effective Open Source Hardware Implementation.

[DOI]

,

Keith D. Underwood

,

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

The implications of working set analysis on supercomputing memory hierarchy design.

[DOI]

Richard C. Murphy

,

,

,

Keith D. Underwood

Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Considering the Relative Importance of Network Performance and Network Features.

[DOI]

,

Keith D. Underwood

Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

A Preliminary Analysis of the MPI Queue Characteristics of Several Applications.

[DOI]

,

,

Keith D. Underwood

Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Initial Performance Evaluation of the Cray SeaStar Interconnect.

[DOI]

,

Kevin T. Pedretti

,

Keith D. Underwood

Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

An Analysis of the Double-Precision Floating-Point FFT on FPGAs.

[DOI]

K. Scott Hemmert

,

Keith D. Underwood

Proceedings of the 13th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), 2005

A Comparison of Floating Point and Logarithmic Number Systems for FPGAs.

[DOI]

Michael Haselman

,

Michael J. Beauchamp

,

,

,

Keith D. Underwood

,

K. Scott Hemmert

Proceedings of the 13th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), 2005

Accelerating List Management for MPI.

[DOI]

Keith D. Underwood

,

,

Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Implementation and Performance of Portals 3.3 on the Cray XT3.

[DOI]

,

Trammell Hudson

,

Kevin T. Pedretti

,

,

Keith D. Underwood

Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

2004

An Analysis of the Cost Effectiveness of an Adaptable Computing Cluster.

[DOI]

Keith D. Underwood

,

Walter B. Ligon III

,

Clust. Comput., 2004

An Initial Analysis of the Impact of Overlap and Independent Progress for MPI.

[DOI]

,

Keith D. Underwood

,

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

An Analysis of NIC Resource Usage for Offloading MPI.

[DOI]

,

Keith D. Underwood

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Characterizing a new class of threads in scientific applications for high end supercomputers.

[DOI]

,

Richard C. Murphy

,

,

Keith D. Underwood

Proceedings of the 18th Annual International Conference on Supercomputing, 2004

An analysis of the impact of MPI overlap and independent progress.

[DOI]

,

Keith D. Underwood

Proceedings of the 18th Annual International Conference on Supercomputing, 2004

The Impact of MPI Queue Usage on Message Latency.

[DOI]

Keith D. Underwood

,

Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

FPGAs vs. CPUs: trends in peak floating-point performance.

[DOI]

Keith D. Underwood

Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance.

[DOI]

Keith D. Underwood

,

K. Scott Hemmert

Proceedings of the 12th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2004), 2004

A comparison of 4X InfiniBand and Quadrics Elan-4 technologies.

[DOI]

,

Douglas Doerfler

,

Keith D. Underwood

Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

2003

Analysis of a prototype intelligent network interface.

[DOI]

Keith D. Underwood

,

Walter B. Ligon III

,

Concurr. Comput. Pract. Exp., 2003

Evaluation of an Eager Protocol Optimization for MPI.

[DOI]

,

Keith D. Underwood

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29, 2003

A Configurable Network Protocol for Cluster Based Communications using Modular Hardware Primitives on an Intelligent NIC.

[DOI]

Ranjesh G. Jaganathan

,

Keith D. Underwood

,

Proceedings of the 11th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2003), 2003

Implications of a PIM Architectural Model for MPI.

[DOI]

,

Richard C. Murphy

,

,

Jay B. Brockman

,

,

Keith D. Underwood

Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

A Performance Comparison of Linux and a Lightweight Kernel.

[DOI]

,

,

Keith D. Underwood

,

Trammell Hudson

,

Patrick G. Bridges

,

Arthur B. Maccabe

Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

2002

GRIP: A Reconfigurable Architecture for Host-Based Gigabit-Rate Packet Processing.

[DOI]

,

,

,

,

Keith D. Underwood

Proceedings of the 10th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2002), 2002

2001

Cost effectiveness of an adaptable computing cluster.

[DOI]

Keith D. Underwood

,

,

Walter B. Ligon III

Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

Acceleration of a 2D-FFT on an Adaptable Computing Cluster.

[DOI]

Keith D. Underwood

,

,

Walter B. Ligon III

Proceedings of the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2001

A Reconfigurable Extension to the Network Interface of Beowulf Clusters.

[DOI]

Keith D. Underwood

,

,

Walter B. Ligon III

Proceedings of the 2001 IEEE International Conference on Cluster Computing (CLUSTER 2001), 2001

1998

Implementation of IEEE Single-Precision Floating-Point Operations on FPGAs (Abstract).

[DOI]

Walter B. Ligon III

,

,

,

Kevin Schoonover

,

,

Keith D. Underwood

Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, 1998

A Re-evaluation of the Practicality of Floating-Point Operations on FPGAs.

[DOI]

Walter B. Ligon III

,

,

,

Kevin Schoonover

,

,

Keith D. Underwood

Proceedings of the 6th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '98), 1998

Loading...