Karl S. Hemmert

According to our database1, Karl S. Hemmert authored at least 47 papers between 1999 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
ERAS: A Flexible and Scalable Framework for Seamless Integration of RTL Models with Structural Simulation Toolkit.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

2022
"Smarter" NICs for faster molecular dynamics: a case study.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021
Exploration of Congestion Control Techniques on Dragonfly-class HPC Networks Through Simulation.
Proceedings of the 2021 International Workshop on Performance Modeling, 2021

2019
Simulation Framework for Studying Optical Cable Failures in Dragonfly Topologies.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

2017
Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation.
J. Parallel Distributed Comput., 2017

Performance analysis for using non-volatile memory DIMMs: opportunities and challenges.
Proceedings of the International Symposium on Memory Systems, 2017

Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
Low Latency, High Bisection-Bandwidth Networks for Exascale Memory Systems.
Proceedings of the Second International Symposium on Memory Systems, 2016

(SAI) Stalled, Active and Idle: Characterizing Power and Performance of Large-Scale Dragonfly Networks.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

2014
An evaluation of MPI message rate on hybrid-core processors.
Int. J. High Perform. Comput. Appl., 2014

Exascale design space exploration and co-design.
Future Gener. Comput. Syst., 2014

Using a complementary emulation-simulation co-design approach to assess application readiness for processing-in-memory systems.
Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

Abstract machine models and proxy architectures for exascale computing.
Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

2013
The impact of hybrid-core processors on MPI message rate.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

2012
Application-driven analysis of two generations of capability computing: the transition to multicore processors.
Concurr. Comput. Pract. Exp., 2012

Improvements to the structural simulation toolkit.
Proceedings of the International ICST Conference on Simulation Tools and Techniques, 2012

Poster: Portals 4 Network Programming Interface.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

2011
The structural simulation toolkit.
SIGMETRICS Perform. Evaluation Rev., 2011

The Impact of Injection Bandwidth Performance on Application Scalability.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Using Triggered Operations to Offload Rendezvous Messages.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Let there be light!: the future of memory systems is photonics and 3D stacking.
Proceedings of the 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '11, 2011

Enabling Flexible Collective Communication Offload with Triggered Operations.
Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

Enhanced Support for OpenSHMEM Communication in Portals.
Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

2010
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs.
ACM Trans. Reconfigurable Technol. Syst., 2010

On the Path to Exascale.
Int. J. Distributed Syst. Technol., 2010

Green HPC: From Nice to Necessity.
Comput. Sci. Eng., 2010

Using Triggered Operations to Offload Collective Communication Operations.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Challenges for High-Performance Networking for Exascale Computing.
Proceedings of the 19th International Conference on Computer Communications and Networks, 2010

Exascale Computing and the Role of Co-Design.
Proceedings of the High Performance Computing: From Grids and Clouds to Exascale, 2010

2009
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing.
ACM Trans. Reconfigurable Technol. Syst., 2009

An application based MPI message throughput benchmark.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008
Architectural Modifications to Enhance the Floating-Point Performance of FPGAs.
IEEE Trans. Very Large Scale Integr. Syst., 2008

High message rate, NIC-based atomics: Design and performance considerations.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007
Floating-Point Divider Design for FPGAs.
IEEE Trans. Very Large Scale Integr. Syst., 2007

An architecture to perform NIC based MPI matching.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
Tools and techniques for performance - Architectures and APIs: assessing requirements for delivering FPGA performance to applications.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Architectural Modifications to Improve Floating-Point Unit Efficiency in FPGAs.
Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL), 2006

Embedded floating-point units in FPGAs.
Proceedings of the ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, 2006

Open Source High Performance Floating-Point Modules.
Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006

2005
A Hardware Acceleration Unit for MPI Queue Processing.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

An Analysis of the Double-Precision Floating-Point FFT on FPGAs.
Proceedings of the 13th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), 2005

A Comparison of Floating Point and Logarithmic Number Systems for FPGAs.
Proceedings of the 13th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), 2005

2004
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance.
Proceedings of the 12th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2004), 2004

2003
Source Level Debugger for the Sea Cucumber Synthesizing Compiler.
Proceedings of the 11th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2003), 2003

Issues in debugging highly parallel FPGA-based applications derived from source code.
Proceedings of the 2003 Asia and South Pacific Design Automation Conference, 2003

2001
An Application-Specific Compiler for High-Speed Binary Image Morphology.
Proceedings of the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2001

1999
A CAD Suite for High-Performance FPGA Design.
Proceedings of the 7th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '99), 1999


  Loading...