Keith D. Underwood

According to our database1, Keith D. Underwood authored at least 78 papers between 1998 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Datacenter Ethernet and RDMA: Issues at Hyperscale.
CoRR, 2023

Not all applications have boring communication patterns: Profiling message matching with BMM.
Concurr. Comput. Pract. Exp., 2023

Data Center Ethernet and Remote Direct Memory Access: Issues at Hyperscale.
Computer, 2023

2017
Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches.
Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017

2016
Exploiting Offload-Enabled Network Interfaces.
IEEE Micro, 2016

Enabling Scalable High-Performance Systems with the Intel Omni-Path Architecture.
IEEE Micro, 2016

Mitigating MPI Message Matching Misery.
Proceedings of the High Performance Computing - 31st International Conference, 2016

2015
Remote Memory Access Programming in MPI-3.
ACM Trans. Parallel Comput., 2015

Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics.
Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

2014
Reducing Synchronization Overhead Through Bundled Communication.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

2013
Evaluating on-die interconnects for a 4 TB/s router.
Proceedings of the International Conference on Supercomputing, 2013

2012
Poster: Portals 4 Network Programming Interface.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

A Low Impact Flow Control Implementation for Offload Communication Interfaces.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

Exploiting communication and packaging locality for cost-effective large scale networks.
Proceedings of the International Conference on Supercomputing, 2012

2011
Scientific Application Demands on a Reconfigurable Functional Unit Interface.
ACM Trans. Reconfigurable Technol. Syst., 2011

Using Triggered Operations to Offload Rendezvous Messages.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

A Unified Algorithm for Both Randomized Deterministic and Adaptive Routing in Torus Networks.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Enabling Flexible Collective Communication Offload with Triggered Operations.
Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

Enhanced Support for OpenSHMEM Communication in Portals.
Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

2010
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs.
ACM Trans. Reconfigurable Technol. Syst., 2010

Performance evaluation of the Red Storm dual-core upgrade.
Concurr. Comput. Pract. Exp., 2010

Using Triggered Operations to Offload Collective Communication Operations.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Challenges for High-Performance Networking for Exascale Computing.
Proceedings of the 19th International Conference on Computer Communications and Networks, 2010

2009
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing.
ACM Trans. Reconfigurable Technol. Syst., 2009

2008
Architectural Modifications to Enhance the Floating-Point Performance of FPGAs.
IEEE Trans. Very Large Scale Integr. Syst., 2008

High message rate, NIC-based atomics: Design and performance considerations.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing.
Proceedings of the Reconfigurable Computing: Architectures, 2008

2007
Floating-Point Divider Design for FPGAs.
IEEE Trans. Very Large Scale Integr. Syst., 2007

Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Analyzing the Scalability of Graph Algorithms on Eldorado.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Simulating Red Storm: Challenges and Successes in Building a System Simulation.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Scientific Application Acceleration with Reconfigurable Functional Units.
Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2007

An architecture to perform NIC based MPI matching.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
SeaStar Interconnect: Balanced Bandwidth for Scalable Performance.
IEEE Micro, 2006

Implications of application usage characteristics for collective communication offload.
Int. J. High Perform. Comput. Netw., 2006

Tools and techniques for performance - Architectures and APIs: assessing requirements for delivering FPGA performance to applications.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Poster reception - The structural simulation toolkit: exploring novel architectures.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Reconfigurable supercomputing - Is high-performance reconfigurable computing the next supercomputing paradigm?
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Challenges and Issues in Benchmarking MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

A preliminary analysis of the InfiniPath and XD1 network interfaces.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Scientific applications vs. SPEC-FP: a comparison of program behavior.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Architectural Modifications to Improve Floating-Point Unit Efficiency in FPGAs.
Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL), 2006

Embedded floating-point units in FPGAs.
Proceedings of the ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, 2006

Open Source High Performance Floating-Point Modules.
Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006

Fine-Grained Message Pipelining for Improved MPI Performance.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

2005
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications.
Int. J. High Perform. Comput. Appl., 2005

A Hardware Acceleration Unit for MPI Queue Processing.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Enhancing NIC Performance for MPI using Processing-in-Memory.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

RC-BLAST: Towards a Portable, Cost-Effective Open Source Hardware Implementation.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

The implications of working set analysis on supercomputing memory hierarchy design.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Considering the Relative Importance of Network Performance and Network Features.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

A Preliminary Analysis of the MPI Queue Characteristics of Several Applications.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Initial Performance Evaluation of the Cray SeaStar Interconnect.
Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

An Analysis of the Double-Precision Floating-Point FFT on FPGAs.
Proceedings of the 13th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), 2005

A Comparison of Floating Point and Logarithmic Number Systems for FPGAs.
Proceedings of the 13th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), 2005

Accelerating List Management for MPI.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Implementation and Performance of Portals 3.3 on the Cray XT3.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

2004
An Analysis of the Cost Effectiveness of an Adaptable Computing Cluster.
Clust. Comput., 2004

An Initial Analysis of the Impact of Overlap and Independent Progress for MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

An Analysis of NIC Resource Usage for Offloading MPI.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Characterizing a new class of threads in scientific applications for high end supercomputers.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

An analysis of the impact of MPI overlap and independent progress.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

The Impact of MPI Queue Usage on Message Latency.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

FPGAs vs. CPUs: trends in peak floating-point performance.
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance.
Proceedings of the 12th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2004), 2004

A comparison of 4X InfiniBand and Quadrics Elan-4 technologies.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

2003
Analysis of a prototype intelligent network interface.
Concurr. Comput. Pract. Exp., 2003

Evaluation of an Eager Protocol Optimization for MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29, 2003

A Configurable Network Protocol for Cluster Based Communications using Modular Hardware Primitives on an Intelligent NIC.
Proceedings of the 11th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2003), 2003

Implications of a PIM Architectural Model for MPI.
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

A Performance Comparison of Linux and a Lightweight Kernel.
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

2002
GRIP: A Reconfigurable Architecture for Host-Based Gigabit-Rate Packet Processing.
Proceedings of the 10th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2002), 2002

2001
Cost effectiveness of an adaptable computing cluster.
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

Acceleration of a 2D-FFT on an Adaptable Computing Cluster.
Proceedings of the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2001

A Reconfigurable Extension to the Network Interface of Beowulf Clusters.
Proceedings of the 2001 IEEE International Conference on Cluster Computing (CLUSTER 2001), 2001

1998
Implementation of IEEE Single-Precision Floating-Point Operations on FPGAs (Abstract).
Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, 1998

A Re-evaluation of the Practicality of Floating-Point Operations on FPGAs.
Proceedings of the 6th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '98), 1998


  Loading...