Karl S. Hemmert

Kevin A. Brown

Sudheer Chunduri

Robert B. Ross

Proceedings of the 2021 International Workshop on Performance Modeling, 2021

2019

Simulation Framework for Studying Optical Cable Failures in Dragonfly Topologies.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

2017

Performance analysis for using non-volatile memory DIMMs: opportunities and challenges.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Memory Systems, 2017

Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016

Low Latency, High Bisection-Bandwidth Networks for Exascale Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on Memory Systems, 2016

(SAI) Stalled, Active and Idle: Characterizing Power and Performance of Large-Scale Dragonfly Networks.

[BibT_eX]

[DOI]

Dorian C. Arnold

Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

2015

Two-Level Main Memory Co-Design: Multi-threaded Algorithmic Primitives, Analysis, and Simulation.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014

An evaluation of MPI message rate on hybrid-core processors.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2014

Exascale design space exploration and co-design.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2014

Using a complementary emulation-simulation co-design approach to assess application readiness for processing-in-memory systems.

[BibT_eX]

[DOI]

Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

Abstract machine models and proxy architectures for exascale computing.

[BibT_eX]

[DOI]

Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

2013

The impact of hybrid-core processors on MPI message rate.

[BibT_eX]

[DOI]

Proceedings of the 20th European MPI Users's Group Meeting, 2013

2012

Application-driven analysis of two generations of capability computing: the transition to multicore processors.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2012

Improvements to the structural simulation toolkit.

[BibT_eX]

[DOI]

Proceedings of the International ICST Conference on Simulation Tools and Techniques, 2012

Poster: Portals 4 Network Programming Interface.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

2011

The structural simulation toolkit.

[BibT_eX]

[DOI]

SIGMETRICS Perform. Evaluation Rev., 2011

The Impact of Injection Bandwidth Performance on Application Scalability.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Using Triggered Operations to Offload Rendezvous Messages.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Let there be light!: the future of memory systems is photonics and 3D stacking.

[BibT_eX]

[DOI]

Proceedings of the 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '11, 2011

Enabling Flexible Collective Communication Offload with Triggered Operations.

[BibT_eX]

[DOI]

Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

Enhanced Support for OpenSHMEM Communication in Portals.

[BibT_eX]

[DOI]

Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

2010

Fast, Efficient Floating-Point Adders and Multipliers for FPGAs.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2010

On the Path to Exascale.

[BibT_eX]

[DOI]

Int. J. Distributed Syst. Technol., 2010

Green HPC: From Nice to Necessity.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2010

Using Triggered Operations to Offload Collective Communication Operations.

[BibT_eX]

[DOI]

Brian W. Barrett

Proceedings of the Recent Advances in the Message Passing Interface, 2010

Challenges for High-Performance Networking for Exascale Computing.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Computer Communications and Networks, 2010

Exascale Computing and the Role of Co-Design.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing: From Grids and Clouds to Exascale, 2010

2009

From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing.

[BibT_eX]

[DOI]

Craig D. Ulmer

ACM Trans. Reconfigurable Technol. Syst., 2009

An application based MPI message throughput benchmark.

[BibT_eX]

[DOI]

Brian W. Barrett

Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008

Architectural Modifications to Enhance the Floating-Point Performance of FPGAs.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2008

High message rate, NIC-based atomics: Design and performance considerations.

[BibT_eX]

[DOI]

Ron Brightwell

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007

Floating-Point Divider Design for FPGAs.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2007

An architecture to perform NIC based MPI matching.

[BibT_eX]

[DOI]

Arun Rodrigues

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006

Tools and techniques for performance - Architectures and APIs: assessing requirements for delivering FPGA performance to applications.

[BibT_eX]

[DOI]

Craig D. Ulmer

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Architectural Modifications to Improve Floating-Point Unit Efficiency in FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL), 2006

Embedded floating-point units in FPGAs.

[BibT_eX]

[DOI]

Proceedings of the ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, 2006

Open Source High Performance Floating-Point Modules.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006

2005

A Hardware Acceleration Unit for MPI Queue Processing.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

An Analysis of the Double-Precision Floating-Point FFT on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), 2005

A Comparison of Floating Point and Logarithmic Number Systems for FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), 2005

2004

Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2004), 2004

2003

Source Level Debugger for the Sea Cucumber Synthesizing Compiler.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2003), 2003

Issues in debugging highly parallel FPGA-based applications derived from source code.

[BibT_eX]

[DOI]

Brad L. Hutchings

Proceedings of the 2003 Asia and South Pacific Design Automation Conference, 2003

2001

An Application-Specific Compiler for High-Speed Binary Image Morphology.

[BibT_eX]

[DOI]