Holger Fröning

Frank Brückerhoff-Plückelmann

Proceedings of the Design and Architecture for Signal and Image Processing, 2026

2025

Uncertainty-Preserving QBNNs: Multi-Level Quantization of SVI-Based Bayesian Neural Networks for Image Classification.

[BibT_eX]

[DOI]

CoRR, December, 2025

Uncertainty Reasoning with Photonic Bayesian Machines.

[BibT_eX]

[DOI]

Hendrik Borras

Shivaprasad U. Hulyal

Wolfram H. P. Pernice

CoRR, December, 2025

Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation.

[BibT_eX]

[DOI]

CoRR, November, 2025

Scalable and Efficient Intra- and Inter-node Interconnection Networks for Post-Exascale Supercomputers and Data centers.

[BibT_eX]

[DOI]

Joaquin Tarraga-Moreno

Daniel Barley

Francisco J. Andujar-Munoz

Jesús Escudero-Sahuquillo

CoRR, November, 2025

Probabilistic photonic computing for AI.

[BibT_eX]

[DOI]

Frank Brückerhoff-Plückelmann

Wolfram H. P. Pernice

Nat. Comput. Sci., May, 2025

On Hardening DNNs against Noisy Computations.

[BibT_eX]

[DOI]

CoRR, January, 2025

GraphMatch: Subgraph Query Processing on Steroids.

[BibT_eX]

[DOI]

Proc. ACM Manag. Data, 2025

Variance-Aware Noisy Training: Hardening DNNs Against Unstable Analog Computations.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2025

2024

GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., June, 2024

Resource-Efficient Neural Networks for Embedded Systems.

[BibT_eX]

[DOI]

Sebastian Tschiatschek

Franz Pernkopf

Zoubin Ghahramani

J. Mach. Learn. Res., 2024

Function Space Diversity for Uncertainty Prediction via Repulsive Last-Layer Ensembles.

[BibT_eX]

[DOI]

CoRR, 2024

DeepHYDRA: Resource-Efficient Time-Series Anomaly Detection in Dynamically-Configured Systems.

[BibT_eX]

[DOI]

CoRR, 2024

GraphMatch: Subgraph Query Processing on FPGAs.

[BibT_eX]

[DOI]

CoRR, 2024

Walking Noise: On Layer-Specific Robustness of Neural Architectures Against Noisy Computations and Associated Characteristic Learning Dynamics.

[BibT_eX]

[DOI]

Hendrik Borras

Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2024

Less Memory Means Smaller GPUs: Backpropagation with Compressed Activations.

[BibT_eX]

[DOI]

Daniel Barley

Proceedings of the Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024

DeepHYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured Systems.

[BibT_eX]

[DOI]

Proceedings of the 38th ACM International Conference on Supercomputing, 2024

CLAIRE-ROP: Rapid Partitioning-based Deformable Image Registration on Multi-GPU Accelerator.

[BibT_eX]

[DOI]

Vahdaneh Kiani

Oliver Jäkel

Proceedings of the 2024 8th International Conference on Medical and Health Informatics, 2024

2023

Non-relational Databases on FPGAs: Survey, Design Decisions, Challenges.

[BibT_eX]

[DOI]

ACM Comput. Surv., November, 2023

Compressing the Backward Pass of Large-Scale Neural Architectures by Structured Activation Pruning.

[BibT_eX]

[DOI]

Daniel Barley

CoRR, 2023

On Performance Analysis of Graphcore IPUs: Analyzing Squared and Skewed Matrix Multiplication.

[BibT_eX]

[DOI]

CoRR, 2023

Characterization of data compression across CPU platforms and accelerators.

[BibT_eX]

[DOI]

Laura Promberger

Rainer Schwemmer

Concurr. Comput. Pract. Exp., 2023

Reducing Memory Requirements for the IPU using Butterfly Factorizations.

[BibT_eX]

[DOI]

S. Kazem Shekofteh

Christian Alles

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

On the Non-associativity of Analog Computations.

[BibT_eX]

[DOI]

Lisa Kuhn

Proceedings of the Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023

Implications of Noise in Resistive Memory on Deep Neural Networks for Image Classification.

[BibT_eX]

[DOI]

Yannick Emonds

Kai Xi

Proceedings of the Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023

Implementation Techniques for SPMD Kernels on CPUs.

[BibT_eX]

[DOI]

Proceedings of the 2023 International Workshop on OpenCL, 2023

CUDAsap: Statically-Determined Execution Statistics as Alternative to Execution-Based Profiling.

[BibT_eX]

[DOI]

Yannick Emonds

Lorenz Braun

Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022

Joint Program and Layout Transformations to Enable Convolutional Operators on Specialized Hardware Based on Constraint Programming.

[BibT_eX]

[DOI]

Dennis Rieber

Axel Acosta

ACM Trans. Archit. Code Optim., 2022

Walking Noise: Understanding Implications of Noisy Computations on Classification Tasks.

[BibT_eX]

[DOI]

Hendrik Borras

CoRR, 2022

Towards Hardware-Specific Automatic Compression of Neural Networks.

[BibT_eX]

[DOI]

Torben Krieger

CoRR, 2022

HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time and Robustness.

[BibT_eX]

[DOI]

CoRR, 2022

Compiler-aided nd-range parallel-for implementations on CPU in hipSYCL.

[BibT_eX]

[DOI]

Proceedings of the IWOCL'22: International Workshop on OpenCL, Bristol, United Kingdom, May 10, 2022

GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

PipeJSON: Parsing JSON at Line Speed on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Management of Data, 2022

2021

A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2021

Scheduling of Graph Queries: Controlling Intra- and Inter-query Parallelism for a High System Throughput.

[BibT_eX]

[DOI]

Matthias Hauck

Ismail Oukid

CoRR, 2021

The Programming of Deep Learning Accelerators as a Constraint Satisfaction Problem.

[BibT_eX]

[DOI]

Dennis Rieber

Axel Acosta

CoRR, 2021

Understanding Cache Boundness of ML Operators on ARM Processors.

[BibT_eX]

[DOI]

CoRR, 2021

Demystifying memory access patterns of FPGA-based graph processing accelerators.

[BibT_eX]

[DOI]

Proceedings of the GRADES-NDA '21: Proceedings of the 4th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), 2021

Towards Addressing Noise and Static Variations of Analog Computations Using Efficient Retraining.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021

Exploring Memory Access Patterns for Graph Processing Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Datenbanksysteme für Business, 2021

2020

cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Resource-Efficient Speech Mask Estimation for Multi-Channel Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2020

Resource-Efficient Neural Networks for Embedded Systems.

[BibT_eX]

[DOI]

Sebastian Tschiatschek

Franz Pernkopf

Zoubin Ghahramani

CoRR, 2020

On the Difficulty of Designing Processor Arrays for Deep Neural Networks.

[BibT_eX]

[DOI]

Kevin Stehle

Günther Schindler

Proceedings of the IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, 2020

Search Space Complexity of Iteration Domain Based Instruction Embedding for Deep Learning Accelerators.

[BibT_eX]

[DOI]

Dennis Rieber

Proceedings of the IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, 2020

Parameterized Structured Pruning for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, Optimization, and Data Science, 2020

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Assessing the Overhead of Offloading Compression Tasks.

[BibT_eX]

[DOI]

Laura Promberger

Rainer Schwemmer

Proceedings of the ICPP Workshops '20: Workshops, Edmonton, AB, Canada, August 17-20, 2020, 2020

Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation.

[BibT_eX]

[DOI]

Alexander Matz

Johannes Doerfert

Proceedings of the ICPP Workshops '20: Workshops, Edmonton, AB, Canada, August 17-20, 2020, 2020

On Network Locality in MPI-Based HPC Applications.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled Densenets.

[BibT_eX]

[DOI]

Markus Huber

Günther Schindler

Christian Schörkhuber

Wolfgang Roth

Franz Pernkopf

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Metric Selection for GPU Kernel Classification.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2019

On link width scaling for energy-proportional direct interconnection networks.

[BibT_eX]

[DOI]

Steffen Lammel

Concurr. Comput. Pract. Exp., 2019

Constructing virtual 5-dimensional tori out of lower-dimensional network cards.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2019

CUDA Flux: A Lightweight Instruction Profiler for CUDA Applications.

[BibT_eX]

[DOI]

Lorenz Braun

Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019

Training Discrete-Valued Neural Networks with Sign Activations Using Weight Distributions.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2019

Software-Based Buffering of Associative Operations on Random Memory Addresses.

[BibT_eX]

[DOI]

Matthias Hauck

Marcus Paradies

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Effects of Congestion Management on Energy Saving Techniques in Interconnection Networks.

[BibT_eX]

[DOI]

Jesús Escudero-Sahuquillo

Pedro Yébenes

Pedro Javier García

Proceedings of the 5th International Workshop on High-Performance Interconnection Networks in the ExaScale and Big-Data Era, 2019

Quantifying the NUMA Behavior of Partitioned GPGPU Applications.

[BibT_eX]

[DOI]

Alexander Matz

Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 2019

2018

Efficient and Robust Machine Learning for Real-World Systems.

[BibT_eX]

[DOI]

Sebastian Tschiatschek

Robert Peharz

Matthew Mattina

Zoubin Ghahramani

CoRR, 2018

Heterogeneous and unconventional cluster architectures and applications.

[BibT_eX]

[DOI]

Federico Silla

Concurr. Comput. Pract. Exp., 2018

Towards Efficient Forward Propagation on Resource-Constrained Systems.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2018

Resource Efficient Deep Eigenvector Beamforming.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Evaluating Energy-Saving Strategies on Torus, K-Ary N-Tree, and Dragonfly.

[BibT_eX]

[DOI]

Armin Schoffer

Proceedings of the 4th IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2018

Buffer Provisioning for Large-Scale Data-Acquisition Systems.

[BibT_eX]

[DOI]

Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems, 2018

2017

InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2017

An Overview of MPI Characteristics of Exascale Proxy Applications.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 32nd International Conference, 2017

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Modeling and Validating Time, Buffering, and Utilization of a Large-Scale, Real-Time Data Acquisition System.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Early Experiences with Saving Energy in Direct Interconnection Networks.

[BibT_eX]

[DOI]

Steffen Lammel

Proceedings of the 3rd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2017

A Case Study on Implementing Virtual 5D Torus Networks Using Network Components of Lower Dimensionality.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2017

Can Modern Graph Processing Engines Run Concurrent Queries Efficiently?

[BibT_eX]

[DOI]

Matthias Hauck

Marcus Paradies

Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems, 2017

Linking Application Description with Efficient SIMD Code Generation for Low-Precision Signed-Integer GEMM.

[BibT_eX]

[DOI]

Günther Schindler

Manfred Mücke

Proceedings of the Euro-Par 2017: Parallel Processing Workshops, 2017

2016

Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework.

[BibT_eX]

[DOI]

J. Supercomput., 2016

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy.

[BibT_eX]

[DOI]

Parallel Comput., 2016

Heterogeneous cluster architectures and applications.

[BibT_eX]

[DOI]

Federico Silla

Concurr. Comput. Pract. Exp., 2016

SONAR: Automated Communication Characterization for HPC Applications.

[BibT_eX]

[DOI]

Steffen Lammel

Proceedings of the High Performance Computing, 2016

Exploring Time and Energy for Complex Accesses to a Hybrid Memory Cube.

[BibT_eX]

[DOI]

Juri Schmidt

Proceedings of the Second International Symposium on Memory Systems, 2016

Optimizing communication for a 2D-partitioned scalable BFS.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

Analyzing the Energy (Dis-) Proportionality of Scalable Interconnection Networks.

[BibT_eX]

[DOI]

Proceedings of the 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era HiPINEB@HPCA 2016, 2016

2015

On the design of a new dynamic credit-based end-to-end flow control mechanism for HPC clusters.

[BibT_eX]

[DOI]

Parallel Comput., 2015

Analyzing communication models for distributed thread-collaborative processors in terms of energy and time.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Highspeed Graph Processing Exploiting Main-Memory Column Stores.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

Modeling a Large Data-Acquisition Network in a Simulation Framework.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

Special issue on unconventional cluster architectures and applications.

[BibT_eX]

[DOI]

Federico Silla

Clust. Comput., 2014

Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, 2014

Infiniband-Verbs on GPU: A Case Study of Controlling an Infiniband Network Device from the GPU.

[BibT_eX]

[DOI]

Franz-Josef Pfreundt

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Analyzing Put/Get APIs for Thread-Collaborative Processors.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

2013

Data Movement Options in Accelerated Clusters.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

Oncilla: A GAS runtime for efficient resource allocation and data movement in accelerated clusters.

[BibT_eX]

[DOI]

Jeffrey S. Young

Se Hoon Shon

Sudhakar Yalamanchili

Alex Merritt

Karsten Schwan

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

On Achieving High Message Rates.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012

A new degree of freedom for memory allocation in clusters.

[BibT_eX]

[DOI]

Clust. Comput., 2012

A New End-to-End Flow-Control Mechanism for High Performance Computing Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

2011

Network Interfaces.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

MEMSCALE<sup>TM</sup>: A Scalable Environment for Databases.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Unleash Your Memory-Constrained Applications: A 32-Node Non-coherent Distributed-Memory Prototype Cluster.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Highly scalable barriers for future high-performance computing clusters.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on High Performance Computing, 2011

MEMSCALE: in-cluster-memory databases.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

2010

Efficient hardware support for the Partitioned Global Address Space.

[BibT_eX]

[DOI]

Heiner Litz

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Getting Rid of Coherency Overhead for Memory-Hungry Applications.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

2009

An FPGA-Based Custom High Performance Interconnection Network.

[BibT_eX]

[DOI]

Proceedings of the ReConFig'09: 2009 International Conference on Reconfigurable Computing and FPGAs, 2009

Efficient Virtualization of High-Performance Network Interfaces.

[BibT_eX]

[DOI]

Heiner Litz

Proceedings of the Eighth International Conference on Networks, 2009

An FPGA based verification platform for HyperTransport 3.x.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

A HyperTransport 3 Physical Layer Interface for FPGAs.

[BibT_eX]

[DOI]

Heiner Litz

Proceedings of the Reconfigurable Computing: Architectures, 2009

2008

VELO: A Novel Communication Engine for Ultra-Low Latency Message Transfers.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Parallel Processing, 2008

2007

Architectural improvements of interconnection network interfaces.

[BibT_eX]

[DOI]

PhD thesis, 2007

2005

Swordfish: A Simulator for High-Performance Networks.

[BibT_eX]

Mondrian Nüssle