We stand with Ukraine

We stand with Ukraine

Torsten Hoefler

Orcid: 0000-0002-1333-9797

Affiliations:

ETH Zürich

According to our database¹, Torsten Hoefler authored at least 515 papers between 2005 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Awards

ACM Fellow

ACM Fellow 2022, "For foundational contributions to High-Performance Computing and the application of HPC techniques to machine learning".

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2026

Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts.

[DOI]

,

,

,

Torsten Hoefler

,

Sebastian Schemm

,

,

Siddhartha Mishra

CoRR, May, 2026

Confounder Detection via Treatment Intent: A New Observational Study Design.

[DOI]

,

Patrik Okanovic

,

Torsten Hoefler

,

Elias Bareinboim

CoRR, May, 2026

Large Language Model Selection with Limited Annotations.

[DOI]

Yavuz Durmazkeser

,

Patrik Okanovic

,

,

Torsten Hoefler

,

Nezihe Merve Gürel

CoRR, May, 2026

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models.

[DOI]

Vage Egiazarian

,

Erik Schultheis

,

Andrei Panferov

,

,

Torsten Hoefler

,

CoRR, May, 2026

ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations.

[DOI]

,

Lisa Gaedke-Merzhäuser

,

Alexandros Nikolaos Ziogas

,

Vincent Maillou

,

Alexandru Calotoiu

,

,

,

Mathieu Luisier

,

Torsten Hoefler

CoRR, May, 2026

Resilient AI Supercomputer Networking using MRC and SRv6.

[DOI]

CoRR, May, 2026

Earth System Foundation Model (ESFM): A unified framework for heterogeneous data integration and forecasting.

[DOI]

,

,

,

,

,

,

Leonardo Trentini

,

,

,

Torsten Hoefler

,

Siddhartha Mishra

,

Sebastian Schemm

,

,

Mathieu Salzmann

CoRR, May, 2026

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning.

[DOI]

,

,

Torsten Hoefler

,

,

Valentina Pyatkin

CoRR, April, 2026

An Engineering Journey Training Large Language Models at Scale on Alps: The Apertus Experience.

[DOI]

CoRR, April, 2026

Process Reward Agents for Steering Knowledge-Intensive Reasoning.

[DOI]

,

,

,

Torsten Hoefler

,

CoRR, April, 2026

Demystifying Higher-Order Graph Neural Networks.

[DOI]

,

Florian Scheidl

,

Lukas Gianinazzi

,

Grzegorz Kwasniewski

,

Shachar Klaiman

,

Jürgen Müller

,

Torsten Hoefler

IEEE Trans. Pattern Anal. Mach. Intell., March, 2026

Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding.

[DOI]

,

,

,

,

Torsten Hoefler

CoRR, March, 2026

Cost-Effective Empirical Performance Modeling.

[DOI]

,

Benedikt Naumann

,

Alexandru Calotoiu

,

Sebastian Rinke

,

Thorsten Reimann

,

Torsten Hoefler

,

IEEE Trans. Parallel Distributed Syst., February, 2026

Scaling Laws of Global Weather Models.

[DOI]

,

,

Alexandru Calotoiu

,

Torsten Hoefler

CoRR, February, 2026

Spritz: Path-Aware Load Balancing in Low-Diameter Networks.

[DOI]

,

,

,

Ahmad Ghalayini

,

,

Torsten Hoefler

CoRR, February, 2026

GraphSeek: Next-Generation Graph Analytics with LLMs.

[DOI]

,

Lukasz Jarmocik

,

,

Shachar Klaiman

,

,

Robert Gerstenberger

,

Jürgen Müller

,

,

Hubert Niewiadomski

,

Torsten Hoefler

CoRR, February, 2026

MLIR-Forge: A Modular Framework for Language Smiths.

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

CoRR, January, 2026

In-Network Collective Operations: Game Changer or Challenge for AI Workloads?

[DOI]

Torsten Hoefler

,

Mikhail Khalilov

,

,

Surendra Anubolu

,

,

,

,

,

Keith D. Underwood

,

Adrian M. Caulfield

,

,

Amirreza Rastegari

Computer, January, 2026

Flowcut Switching: High-Performance Adaptive Routing With In-Order Delivery Guarantees.

[DOI]

,

Daniele De Sensi

,

Salvatore Di Girolamo

,

Abdulla Bataineh

,

,

,

Torsten Hoefler

IEEE Trans. Netw., 2026

Zeppelin: Balancing Variable-length Workloads in Data Parallel Large Model Training.

[DOI]

,

,

,

,

,

,

,

,

,

Torsten Hoefler

Proceedings of the 21st European Conference on Computer Systems, 2026

REPS: Recycled Entropy Packet Spraying for Adaptive Load Balancing and Failure Mitigation.

[DOI]

,

,

Ahmad Ghalayini

,

Michael Papamichael

,

Mohammad Dohadwala

,

Lukas Gianinazzi

,

Mikhail Khalilov

,

Elias Achermann

,

Daniele De Sensi

,

Torsten Hoefler

Proceedings of the 21st European Conference on Computer Systems, 2026

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments.

[DOI]

Alejandro Hernández-Cano

,

Alexander Hägele

,

Allen Hao Huang

,

Angelika Romanou

,

Antoni-Joan Solergibert i Llaquet

,

,

Bettina Messmer

,

,

Eduard Frank Durech

,

,

Juan Garcia Giraldo

,

Mete Ismayilzada

,

,

,

,

,

,

,

Badr AlKhamissi

,

Ines Altemir Marinas

,

Mohammad Hossein Amani

,

Matin Ansaripour

,

,

,

,

Nicholas John Browning

,

,

Maximilian Böther

,

,

Camille Challier

,

Clément Charmillot

,

,

Jan Milan Deriu

,

,

,

Daniil Dzenhaliou

,

,

,

,

,

,

María Grandury

,

,

Alexander Miserlis Hoyle

,

,

,

Andrei Kucharavy

,

Anastasiia Kucherenko

,

Frederike Lübeck

,

,

Theofilos Ioannis Manitaras

,

Andreas Marfurt

,

,

,

Henrique Mendonça

,

Fawzi Roberto Mohamed

,

Syrielle Montariol

,

,

Sven Najem-Meyer

,

,

,

Matteo Pagliardini

,

,

Andrei Panferov

,

,

Marco Passerini

,

,

Auguste Poiroux

,

Kaustubh Ponkshe

,

,

,

,

Jakhongir Saydaliev

,

Mukhammadali Sayfiddinov

,

Marian Schneider

,

Stefano Schuppli

,

Marco Scialanga

,

,

,

,

,

Alexander Sternfeld

,

Ayush Kumar Tarun

,

,

,

,

,

,

,

,

Caglar Gulcehre

,

David Rosenthal

,

,

Florian Tramèr

,

Joost VandeVondele

,

,

,

Thomas C. Schulthess

,

Torsten Hoefler

,

Antoine Bosselut

,

,

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

Demystifying Chains, Trees, and Graphs of Thoughts.

[DOI]

,

,

,

Robert Gerstenberger

,

,

,

,

,

Grzegorz Kwasniewski

,

Jürgen Müller

,

Lukas Gianinazzi

,

,

Hubert Niewiadomski

,

,

,

Torsten Hoefler

IEEE Trans. Pattern Anal. Mach. Intell., December, 2025

Practical Challenges in Executing Shor's Algorithm on Existing Quantum Platforms.

[DOI]

,

Julian Jang-Jaccard

,

Vincent Lenders

,

,

Torsten Hoefler

,

Cornelius Hempel

CoRR, December, 2025

Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE Accelerators.

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

,

CoRR, December, 2025

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization.

[DOI]

,

Vage Egiazarian

,

Torsten Hoefler

,

CoRR, December, 2025

SPADA: A Spatial Dataflow Architecture Programming Language.

[DOI]

Lukas Gianinazzi

,

,

Torsten Hoefler

CoRR, November, 2025

Inductive Loop Analysis for Practical HPC Application Optimization.

[DOI]

,

,

,

Torsten Hoefler

CoRR, November, 2025

VEIL: Reading Control Flow Graphs Like Code.

[DOI]

,

,

Torsten Hoefler

CoRR, November, 2025

Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge.

[DOI]

,

Patrik Okanovic

,

Torsten Hoefler

,

Elias Bareinboim

CoRR, November, 2025

Error bounded compression for weather and climate applications.

[DOI]

,

,

Florian Scheidl

,

,

Michael Armand Sprenger

,

Sebastian Schemm

,

Torsten Hoefler

CoRR, October, 2025

sNVMe-oF: Secure and Efficient Disaggregated Storage.

[DOI]

,

,

,

,

,

Torsten Hoefler

CoRR, October, 2025

Uno: A One-Stop Solution for Inter- and Intra-Datacenter Congestion Control and Reliable Connectivity.

[DOI]

,

,

,

Ahmad Ghalayini

,

,

,

,

,

,

Konstantin Taranov

,

Mahmoud Elhaddad

,

Daniele De Sensi

,

Soudeh Ghorbani

,

Torsten Hoefler

CoRR, October, 2025

Active Model Selection for Large Language Models.

[DOI]

Yavuz Durmazkeser

,

Patrik Okanovic

,

,

Torsten Hoefler

,

Nezihe Merve Gürel

CoRR, October, 2025

Cppless: Single-Source and High-Performance Serverless Programming in C++.

[DOI]

,

,

Alexandru Calotoiu

,

Torsten Hoefler

ACM Trans. Archit. Code Optim., September, 2025

Beyond Outliers: A Study of Optimizers Under Quantization.

[DOI]

Georgios Vlassis

,

,

Alexandra Volkova

,

Torsten Hoefler

,

CoRR, September, 2025

Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization.

[DOI]

Vage Egiazarian

,

Roberto L. Castro

,

Denis Kuznedelev

,

Andrei Panferov

,

,

,

Alexandre Noll Marques

,

,

,

Torsten Hoefler

,

CoRR, September, 2025

AI Factories: It's time to rethink the Cloud-HPC divide.

[DOI]

Pedro García López

,

Daniel Barcelona Pons

,

,

Torsten Hoefler

,

Eduardo Quiñones

,

Maciej Malawski

,

Peter Pietzutch

,

Alberto P. Martí

,

Thomas Ohlson Timoudas

,

Aleksander Slominski

CoRR, September, 2025

Psychologically Enhanced AI Agents.

[DOI]

,

Shriram Chandran

,

Robert Gerstenberger

,

,

,

Sebastian Hermann Martschat

,

,

,

Hubert Niewiadomski

,

,

Jürgen Müller

,

Torsten Hoefler

CoRR, September, 2025

XaaS Containers: Performance-Portable Representation With Source and IR Containers.

[DOI]

,

,

,

Valérie Hayot-Sasson

,

Alberto Madonna

,

,

,

,

Torsten Hoefler

Dataset, September, 2025

Ab-initio Quantum Transport with the GW Approximation, 42,240 Atoms, and Sustained Exascale Performance.

[DOI]

,

Alexander Maeder

,

Vincent Maillou

,

,

,

Grzegorz Kwasniewski

,

Leonard Deuschle

,

Torsten Hoefler

,

Alexandros Nikolaos Ziogas

,

Mathieu Luisier

CoRR, August, 2025

Ultra Ethernet's Design Principles and Architectural Innovations.

[DOI]

Torsten Hoefler

,

,

,

Keith D. Underwood

,

Cedell Alexander

,

,

,

Adrian M. Caulfield

,

,

,

,

,

Eugene Opsasnick

,

,

,

CoRR, August, 2025

Fast Graph Vector Search via Hardware Acceleration and Delayed-Synchronization Traversal.

[DOI]

,

,

Torsten Hoefler

,

Proc. VLDB Endow., July, 2025

Benchmarking Filtered Approximate Nearest Neighbor Search Algorithms on Transformer-based Embedding Vectors.

[DOI]

,

,

,

,

Torsten Hoefler

CoRR, July, 2025

RailX: A Flexible, Scalable, and Low-Cost Network Architecture for Hyper-Scale LLM Training Systems.

[DOI]

,

,

,

,

,

,

,

Torsten Hoefler

CoRR, July, 2025

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm.

[DOI]

,

Torsten Hoefler

,

CoRR, July, 2025

BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers.

[DOI]

Patrik Okanovic

,

Sameer Deshmukh

,

Grzegorz Kwasniewski

,

Kentaro Katayama

,

,

,

Torsten Hoefler

CoRR, July, 2025

Higher-Order Graph Databases.

[DOI]

,

Shriram Chandran

,

,

,

,

Robert Gerstenberger

,

,

Jürgen Müller

,

Torsten Hoefler

CoRR, June, 2025

Finetuning a Weather Foundation Model with Lightweight Decoders for Unseen Physical Processes.

[DOI]

,

,

,

Torsten Hoefler

,

Siddhartha Mishra

,

Sebastian Schemm

CoRR, June, 2025

Cppless: Single-Source and High-Performance Serverless Programming in C++.

[DOI]

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Dataset, June, 2025

Productivity, Portability, Performance, and Reproducibility: Data-Centric Python.

[DOI]

Alexandros Nikolaos Ziogas

,

,

,

Alexandru Calotoiu

,

Tiziano De Matteis

,

Johannes de Fine Licht

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., May, 2025

FoldedHexaTorus: An Inter-Chiplet Interconnect Topology for Chiplet-based Systems using Organic and Glass Substrates.

[DOI]

,

,

Torsten Hoefler

CoRR, April, 2025

Denoising Application Performance Models with Noise-Resilient Priors.

[DOI]

Gustavo de Morais

,

Alexander Geiß

,

Alexandru Calotoiu

,

,

,

Torsten Hoefler

,

,

CoRR, April, 2025

Affordable AI Assistants with Knowledge Graph of Thoughts.

[DOI]

,

Lorenzo Paleari

,

Jia Hao Andrea Jiang

,

Robert Gerstenberger

,

,

,

,

,

,

Jón Gunnar Hannesson

,

Grzegorz Kwasniewski

,

,

Hubert Niewiadomski

,

Torsten Hoefler

CoRR, April, 2025

Iterating Pointers: Enabling Static Analysis for Loop-based Pointers.

[DOI]

,

Alexandru Calotoiu

,

Torsten Hoefler

ACM Trans. Archit. Code Optim., March, 2025

Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel Applications.

[DOI]

,

,

,

,

,

Torsten Hoefler

,

,

,

IEEE Trans. Parallel Distributed Syst., February, 2025

PlaceIT: Placement-based Inter-Chiplet Interconnect Topologies.

[DOI]

,

Benigna Bruggmann

,

,

,

Torsten Hoefler

CoRR, February, 2025

Replication Package for "SeBS-Flow: Benchmarking Serverless Cloud Function Workflows".

[DOI]

,

,

Alexandru Calotoiu

,

Laurin Brandner

,

,

Torsten Hoefler

Dataset, February, 2025

Replication Package for "SeBS-Flow: Benchmarking Serverless Cloud Function Workflows".

[DOI]

,

,

Alexandru Calotoiu

,

Laurin Brandner

,

,

Torsten Hoefler

Dataset, February, 2025

Replication Package for "SeBS-Flow: Benchmarking Serverless Cloud Function Workflows".

[DOI]

,

,

Alexandru Calotoiu

,

Laurin Brandner

,

,

Torsten Hoefler

Dataset, February, 2025

Replication Package for "SeBS-Flow: Benchmarking Serverless Cloud Function Workflows".

[DOI]

,

,

Alexandru Calotoiu

,

Laurin Brandner

,

,

Torsten Hoefler

Dataset, February, 2025

Replication Package for "SeBS-Flow: Benchmarking Serverless Cloud Function Workflows".

[DOI]

,

,

Alexandru Calotoiu

,

Laurin Brandner

,

,

Torsten Hoefler

Dataset, February, 2025

Replication Package for "SeBS-Flow: Benchmarking Serverless Cloud Function Workflows".

[DOI]

,

,

Alexandru Calotoiu

,

Laurin Brandner

,

,

Torsten Hoefler

Dataset, February, 2025

Replication Package for "SeBS-Flow: Benchmarking Serverless Cloud Function Workflows".

[DOI]

,

,

Alexandru Calotoiu

,

Laurin Brandner

,

,

Torsten Hoefler

Dataset, February, 2025

Reasoning Language Models: A Blueprint.

[DOI]

,

,

,

,

Afonso Claudino Catarino

,

Robert Gerstenberger

,

,

,

,

,

,

,

Grzegorz Kwasniewski

,

Jürgen Müller

,

,

Hannes Eberhard

,

Hubert Niewiadomski

,

Torsten Hoefler

CoRR, January, 2025

HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs.

[DOI]

,

,

,

Roberto L. Castro

,

Torsten Hoefler

,

CoRR, January, 2025

Atlas-independent brain connectome analysis at voxel-level granularity: graph convolutional networks for etiology classification in newborns.

[DOI]

,

Lukas Gianinazzi

,

Sepp Kollmorgen

,

Cornelia F. Hagmann

,

Patrice Grehten

,

,

Giancarlo Natalucci

,

,

,

,

Torsten Hoefler

,

NeuroImage, 2025

CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training.

[DOI]

,

,

,

Torsten Hoefler

Proceedings of the 2025 USENIX Annual Technical Conference, 2025

SeBS 2.0: Keeping up with the Clouds.

[DOI]

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the 3rd Workshop on SErverless Systems, Applications and MEthodologies, 2025

Ab-initio Quantum Transport with the GW Approximation, 42, 240 Atoms, and Sustained Exascale Performance.

[DOI]

,

Alexander Maeder

,

Vincent Maillou

,

,

,

Grzegorz Kwasniewski

,

Leonard Deuschle

,

Torsten Hoefler

,

Alexandros Nikolaos Ziogas

,

Mathieu Luisier

Proceedings of the International Conference for High Performance Computing, 2025

ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage.

[DOI]

,

,

,

Pasquale Jordan

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2025

Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality.

[DOI]

Daniele De Sensi

,

Saverio Pasqualoni

,

Lorenzo Piarulli

,

,

,

Matteo Turisini

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2025

C.A.T.S.: Memory and Control Flow Tracing for Whole-Program Performance Analysis.

[DOI]

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2025

Computing the Full Earth System at 1km Resolution.

[DOI]

Proceedings of the International Conference for High Performance Computing, 2025

SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA Communication.

[DOI]

Mikhail Khalilov

,

,

,

,

,

Nicola Mazzoletti

,

Peter-Jan Gootzen

,

Salvatore Di Girolamo

,

,

,

,

,

Sreevatsa Anantharamu

,

,

Konstantin Taranov

,

,

,

Mahmoud Elhaddad

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2025

Core Hours and Carbon Credits: Incentivizing Sustainability in HPC.

[DOI]

,

Maxime Gonthier

,

Valérie Hayot-Sasson

,

,

,

Raul Castro Fernandez

,

Torsten Hoefler

,

,

Proceedings of the International Conference for High Performance Computing, 2025

PerfDojo: Automated ML Library Generation for Heterogeneous Architectures.

[DOI]

,

,

Gioele Gottardo

,

,

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2025

XaaS Containers: Performance-Portable Representation With Source and IR Containers.

[DOI]

,

,

,

Valérie Hayot-Sasson

,

Alberto Madonna

,

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2025

Uno: A One-Stop Solution for Inter- and Intra-Data Center Congestion Control and Reliable Connectivity.

[DOI]

,

,

,

Ahmad Ghalayini

,

,

,

,

,

,

Konstantin Taranov

,

Mahmoud Elhaddad

,

Daniele De Sensi

,

Soudeh Ghorbani

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2025

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.

[DOI]

,

Roberto L. Castro

,

,

Torsten Hoefler

,

Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs.

[DOI]

,

,

,

Roberto L. Castro

,

Torsten Hoefler

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Energy-Optimal and Low-Depth Algorithmic Primitives for Spatial Dataflow Architectures.

[DOI]

Lukas Gianinazzi

,

,

,

,

,

Piotr Luczynski

,

Torsten Hoefler

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs.

[DOI]

,

,

,

Torsten Hoefler

Proceedings of the IEEE International Symposium on Workload Characterization, 2025

EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC.

[DOI]

,

Mikhail Khalilov

,

Lukas Gianinazzi

,

,

,

,

,

Robert W. Wisniewski

,

Torsten Hoefler

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

FPsPIN: An FPGA-Based Open-Hardware Research Platform for Processing in the Network.

[DOI]

,

,

Torsten Hoefler

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2025

Demystifying NCCL: An In-Depth Analysis of GPU Communication Protocols and Algorithms.

[DOI]

,

,

,

Sylvain Jeaugey

,

Cedell Alexander

,

,

,

Jeff R. Hammond

,

Torsten Hoefler

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2025

SeBS-Flow: Benchmarking Serverless Cloud Function Workflows.

[DOI]

,

,

Alexandru Calotoiu

,

Laurin Brandner

,

,

Torsten Hoefler

Proceedings of the Twentieth European Conference on Computer Systems, 2025

Evolving HPC services to enable ML workloads on HPE Cray EX.

[DOI]

Stefano Schuppli

,

,

Henrique Mendonça

,

Nina Mujkanovic

,

,

Dino Conciatore

,

,

,

,

Joost VandeVondele

,

Maxime Martinasso

,

Thomas C. Schulthess

,

Torsten Hoefler

Proceedings of the Cray User Group, 2025

DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing.

[DOI]

,

Alexandru Calotoiu

,

,

Torsten Hoefler

Proceedings of the IEEE International Conference on Cluster Computing, 2025

A Priori Loop Nest Normalization: Automatic Loop Scheduling in Complex Applications.

[DOI]

,

,

,

Alexandru Calotoiu

,

,

Torsten Hoefler

Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization, 2025

RapidChiplet: A Toolchain for Rapid Design Space Exploration of Inter-Chiplet Interconnects.

[DOI]

,

Benigna Bruggmann

,

,

,

,

Torsten Hoefler

Proceedings of the 22nd ACM International Conference on Computing Frontiers, 2025

All models are wrong, some are useful: Model Selection with Limited Labels.

[DOI]

Patrik Okanovic

,

,

,

Torsten Hoefler

,

,

Nezihe Merve Gürel

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

2024

HammingMesh: A Network Topology for Large-Scale Deep Learning.

[DOI]

Torsten Hoefler

,

,

Daniele De Sensi

,

Salvatore Di Girolamo

,

,

,

,

,

Commun. ACM, December, 2024

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.

[DOI]

,

Roberto L. Castro

,

,

Torsten Hoefler

,

Dataset, November, 2024

AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost.

[DOI]

,

,

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., August, 2024

Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis.

[DOI]

,

Torsten Hoefler

IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Canary: Congestion-aware in-network allreduce using dynamic trees.

[DOI]

Daniele De Sensi

,

Edgar Costa Molero

,

Salvatore Di Girolamo

,

Laurent Vanbever

,

Torsten Hoefler

Future Gener. Comput. Syst., March, 2024

Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries.

[DOI]

,

Robert Gerstenberger

,

,

,

Michal Podstawski

,

Claude Barthels

,

,

Torsten Hoefler

ACM Comput. Surv., February, 2024

A High-Performance, Energy-Efficient Modular DMA Engine Architecture.

[DOI]

,

Michael Rogenmoser

,

,

,

Alessandro Ottaviano

,

,

Torsten Hoefler

,

IEEE Trans. Computers, January, 2024

Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models.

[DOI]

,

,

,

Torsten Hoefler

,

Proc. VLDB Endow., 2024

Digital twins of Earth and the computing challenge of human interaction.

[DOI]

,

Torsten Hoefler

,

,

Wilco Hazeleger

Nat. Comput. Sci., 2024

RED-SEA Project: Towards a new-generation European interconnect.

[DOI]

María Engracia Gómez

,

Julio Sahuquillo

,

Andrea Biagioni

,

,

,

Ottorino Frezza

,

Francesca Lo Cicero

,

Alessandro Lonardo

,

Michele Martinelli

,

Pier Stanislao Paolucci

,

Elena Pastorelli

,

Francesco Simula

,

Matteo Turisini

,

,

Roberto Ammendola

,

Carlotta Chiarini

,

,

Fabrizio Capuani

,

Adrián Castelló

,

,

Eugenio Stabile

,

Enrique S. Quintana-Ortí

,

Pascale Bernier-Bruna

,

,

Pierre-Axel Lagadec

,

Gregoire Pichon

,

,

Manolis Katevenis

,

Sokratis Bartzis

,

Orestis Mousouros

,

Pantelis Xirouchakis

,

Vangelis Mageiropoulos

,

Michalis Gianioudis

,

,

Aggelos Ioannou

,

Nikos Kallimanis

,

Miguel Sánchez de la Rosa

,

Gabriel Gomez-Lopez

,

Francisco Alfaro-Cortés

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

José L. Sánchez

,

Gaetan De Gassowski

,

Matthieu Hautreaux

,

Stephane Mathieu

,

,

,

,

Torsten Hoefler

,

,

,

Giuseppe Piero Brandino

,

Francesco De Giorgi

,

,

Iakovos Mavroidis

,

Yannis Papaefstathiou

,

Nikolaos Tampouratzis

,

Benjamin Kalisch

,

Ulrich Krackhardt

,

Mondrian Nuessle

,

Wolfgang Frings

,

Dominik Gottwald

,

Felime Guimaraes

,

,

,

,

,

,

,

Jennifer Lopez Barillao

,

,

Microprocess. Microsystems, 2024

XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing.

[DOI]

Torsten Hoefler

,

,

,

,

,

Manish Parashar

,

,

Matthias Troyer

,

Thomas C. Schulthess

,

,

Jack J. Dongarra

Comput. Sci. Eng., 2024

EfQAT: An Efficient Framework for Quantization-Aware Training.

[DOI]

,

,

Torsten Hoefler

,

Evangelos Eleftheriou

,

CoRR, 2024

Fortify Your Foundations: Practical Privacy and Security for Foundation Model Deployments In The Cloud.

[DOI]

,

Anjo Vahldiek-Oberwagner

,

Marcin Spoczynski

,

Scott Constable

,

,

Torsten Hoefler

CoRR, 2024

Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments.

[DOI]

,

Robert Gerstenberger

,

,

Pournima Sonawane

,

Juan Gómez-Luna

,

Raghavendra Kanakagiri

,

,

,

Torsten Hoefler

,

,

CoRR, 2024

Understanding Data Movement in Tightly Coupled Heterogeneous Systems: A Case Study with the Grace Hopper Superchip.

[DOI]

,

Mikhail Khalilov

,

,

Giridhar Chukkapalli

,

Thomas C. Schulthess

,

Torsten Hoefler

CoRR, 2024

REPS: Recycling Entropies for Packet Spraying to Adaptively Explore Paths and Mitigate Failures.

[DOI]

,

,

Ahmad Ghalayini

,

Mohammad Dohadwala

,

Michael Papamichael

,

Daniele De Sensi

,

Torsten Hoefler

CoRR, 2024

Demystifying Higher-Order Graph Neural Networks.

[DOI]

,

Florian Scheidl

,

Lukas Gianinazzi

,

Shachar Klaiman

,

Jürgen Müller

,

Torsten Hoefler

CoRR, 2024

Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal.

[DOI]

,

,

Torsten Hoefler

,

CoRR, 2024

Multi-Head RAG: Solving Multi-Aspect Problems with LLMs.

[DOI]

,

,

,

Robert Gerstenberger

,

Lucas Weitzendorf

,

,

,

,

,

Jürgen Müller

,

Hubert Niewiadomski

,

,

Michal Podstawski

,

Torsten Hoefler

CoRR, 2024

CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks.

[DOI]

,

Lorenzo Paleari

,

,

,

Robert Gerstenberger

,

,

,

Hubert Niewiadomski

,

Torsten Hoefler

CoRR, 2024

Towards Specialized Supercomputers for Climate Sciences: Computational Requirements of the Icosahedral Nonhydrostatic Weather and Climate Model.

[DOI]

Torsten Hoefler

,

Alexandru Calotoiu

,

Anurag Dipankar

,

Thomas C. Schulthess

,

Xavier Lapillonne

,

CoRR, 2024

SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels.

[DOI]

,

Torsten Hoefler

CoRR, 2024

SMaRTT-REPS: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport with Recycled Entropies.

[DOI]

,

,

Daniele De Sensi

,

,

,

,

,

,

,

Ahmad Ghalayini

,

Daniel S. F. Alves

,

Michael Papamichael

,

Adrian M. Caulfield

,

Torsten Hoefler

CoRR, 2024

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs.

[DOI]

,

Amirkeivan Mohtashami

,

Maximilian L. Croci

,

,

,

,

Torsten Hoefler

,

CoRR, 2024

Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts.

[DOI]

,

,

,

Robert Gerstenberger

,

,

,

,

Grzegorz Kwasniewski

,

Jürgen Müller

,

Lukas Gianinazzi

,

,

Hubert Niewiadomski

,

,

Torsten Hoefler

CoRR, 2024

Cppless: Productive and Performant Serverless Programming in C++.

[DOI]

,

,

Alexandru Calotoiu

,

Torsten Hoefler

CoRR, 2024

OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs.

[DOI]

Mikhail Khalilov

,

,

,

Alessandro Vezzu

,

,

Salvatore Di Girolamo

,

,

Daniele De Sensi

,

,

Torsten Hoefler

Proceedings of the 2024 USENIX Annual Technical Conference, 2024

PolarStar: Expanding the Horizon of Diameter-3 Networks.

[DOI]

Kartik Lakhotia

,

,

,

,

,

Torsten Hoefler

,

Fabrizio Petrini

Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures, 2024

LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming.

[DOI]

,

,

,

,

,

,

Robert W. Wisniewski

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2024

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects.

[DOI]

Daniele De Sensi

,

Lorenzo Pichetti

,

,

Tiziano De Matteis

,

,

,

Matteo Turisini

,

Daniele Cesarini

,

,

Animesh Trivedi

,

,

,

Salvatore Di Girolamo

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2024

High Performance Unstructured SpMM Computation Using Tensor Cores.

[DOI]

Patrik Okanovic

,

Grzegorz Kwasniewski

,

Paolo Sylos Labini

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2024

Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI.

[DOI]

Mikhail Khalilov

,

Salvatore Di Girolamo

,

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2024

Offloaded MPI message matching: an optimistic approach.

[DOI]

Jerónimo S. García

,

Salvatore Di Girolamo

,

,

J. J. Vegas Olmos

,

,

Torsten Hoefler

,

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication.

[DOI]

Lukas Gianinazzi

,

Alexandros Nikolaos Ziogas

,

,

Piotr Luczynski

,

Saleh Ashkboosh

,

Florian Scheidl

,

,

,

,

,

,

Torsten Hoefler

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Swing: Short-cutting Rings for Higher Bandwidth Allreduce.

[DOI]

Daniele De Sensi

,

,

,

Torsten Hoefler

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network.

[DOI]

,

,

Daniele De Sensi

,

,

,

,

,

Marek Konieczny

,

Kartik Lakhotia

,

,

,

Fabrizio Petrini

,

Torsten Hoefler

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs.

[DOI]

,

Amirkeivan Mohtashami

,

Maximilian L. Croci

,

,

Pashmina Cameron

,

,

,

Torsten Hoefler

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Software Resource Disaggregation for HPC with Serverless Computing.

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Low-Depth Spatial Tree Algorithms.

[DOI]

,

,

,

Lukas Gianinazzi

,

Torsten Hoefler

,

Piotr Luczynski

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

DiffDA: a Diffusion model for weather-scale Data Assimilation.

[DOI]

,

Lukas Gianinazzi

,

,

Peter D. Düben

,

Torsten Hoefler

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression.

[DOI]

,

Ruslan Svirschevski

,

Vage Egiazarian

,

Denis Kuznedelev

,

,

,

Alexander Borzunov

,

Torsten Hoefler

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SliceGPT: Compress Large Language Models by Deleting Rows and Columns.

[DOI]

,

Maximilian L. Croci

,

Marcelo Gennari Do Nascimento

,

Torsten Hoefler

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Extending RISC-V for Efficient Overflow Recovery in Mixed-Precision Computations.

[DOI]

Luca Bertaccini

,

,

Torsten Hoefler

,

Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

Near-Optimal Wafer-Scale Reduce.

[DOI]

Piotr Luczynski

,

Lukas Gianinazzi

,

,

Leighton Wilson

,

Daniele De Sensi

,

Torsten Hoefler

Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

FaaSKeeper: Learning from Building Serverless Services with ZooKeeper as an Example.

[DOI]

,

Alexandru Calotoiu

,

,

Konstantin Taranov

,

Torsten Hoefler

Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models.

[DOI]

,

,

,

,

,

,

Torsten Hoefler

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems Through Polling-Free and Retry-Free Operation.

[DOI]

,

Marc Gantenbein

,

Alessandro Ottaviano

,

Torsten Hoefler

,

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry" Benchmark.

[DOI]

,

Torsten Hoefler

,

Proceedings of the Conference on Parsimony and Learning, 2024

Process-as-a-Service: Unifying Elastic and Stateful Clouds with Serverless Processes.

[DOI]

,

Alexandru Calotoiu

,

,

Roman Böhringer

,

,

Torsten Hoefler

Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

Graph of Thoughts: Solving Elaborate Problems with Large Language Models.

[DOI]

,

,

,

Robert Gerstenberger

,

Michal Podstawski

,

Lukas Gianinazzi

,

,

,

Hubert Niewiadomski

,

,

Torsten Hoefler

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra.

[DOI]

,

,

,

Torsten Hoefler

,

IEEE Trans. Parallel Distributed Syst., December, 2023

Performance Measurement Dataset of the HPC Benchmarks FASTEST, Kripke, LULESH, MiniFE, Quicksilver, and RELeARN for Scalability Studies with Extra-P.

[DOI]

,

Alexandru Calotoiu

,

Sebastian Rinke

,

Thorsten Reimann

,

Torsten Hoefler

,

Dataset, November, 2023

Myths and legends in high-performance computing.

[DOI]

Satoshi Matsuoka

,

,

,

Aleksandr Drozd

,

Torsten Hoefler

Int. J. High Perform. Comput. Appl., July, 2023

Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems.

[DOI]

,

,

Vasiliki Kalavri

,

Michael Kapralov

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., June, 2023

GNN Scaling 0.1 Software Artifact.

[DOI]

,

,

Robert Gerstenberger

,

Paolo Sylos Labini

,

Alexandros Nikolaos Ziogas

,

,

Lukas Gianinazzi

,

Florian Scheidl

,

,

,

,

Grzegorz Kwasniewski

,

Raghavendra Kanakagiri

,

,

,

,

,

Torsten Hoefler

Dataset, June, 2023

GDI-RMA 0.1 Software Artifact.

[DOI]

,

Robert Gerstenberger

,

,

Michal Podstawski

,

Jürgen Müller

,

,

,

George Mitenkov

,

Marek T. Michalewicz

,

Torsten Hoefler

Dataset, June, 2023

Disentangling Hype from Practicality: On Realistically Achieving Quantum Advantage.

[DOI]

Torsten Hoefler

,

,

Matthias Troyer

Commun. ACM, May, 2023

Arrow Matrix Decompositions.

[DOI]

Lukas Gianinazzi

,

Alexandros Nikolaos Ziogas

,

Piotr Luczynski

,

Saleh Ashkboosh

,

,

Florian Scheidl

,

,

,

,

,

Torsten Hoefler

Dataset, April, 2023

Earth Virtualization Engines: A Technical Perspective.

[DOI]

Comput. Sci. Eng., 2023

RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures.

[DOI]

,

Benigna Bruggmann

,

,

,

Torsten Hoefler

CoRR, 2023

Towards End-to-end 4-Bit Inference on Generative Large Language Models.

[DOI]

,

,

,

,

,

,

Torsten Hoefler

,

CoRR, 2023

Cached Operator Reordering: A Unified View for Fast GNN Training.

[DOI]

,

,

,

,

,

,

Torsten Hoefler

CoRR, 2023

High-Performance Graph Databases That Are Portable, Programmable, and Scale to Hundreds of Thousands of Cores.

[DOI]

,

Robert Gerstenberger

,

,

Michal Podstawski

,

Jürgen Müller

,

,

,

George Mitenkov

,

Wojciech Chlapek

,

Marek T. Michalewicz

,

Torsten Hoefler

CoRR, 2023

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch.

[DOI]

,

Satoki Ishikawa

,

,

,

Torsten Hoefler

CoRR, 2023

STen: Productive and Efficient Sparsity in PyTorch.

[DOI]

,

,

,

,

Torsten Hoefler

CoRR, 2023

Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization.

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

CoRR, 2023

PolarStar: Expanding the Scalability Horizon of Diameter-3 Networks.

[DOI]

Kartik Lakhotia

,

,

,

,

,

Torsten Hoefler

,

Fabrizio Petrini

CoRR, 2023

Datacenter Ethernet and RDMA: Issues at Hyperscale.

[DOI]

Torsten Hoefler

,

,

Keith D. Underwood

,

,

,

Vahid Tabatabaee

,

,

Surendra Anubolu

,

,

,

,

CoRR, 2023

Approximate Reversible Circuits for NISQ-Era Quantum Computers.

[DOI]

,

,

Torsten Hoefler

CoRR, 2023

AutoDDL: Automatic Distributed Deep Learning with Asymptotically Optimal Communication.

[DOI]

,

,

,

,

Torsten Hoefler

CoRR, 2023

A Theory of I/O-Efficient Sparse Neural Network Inference.

[DOI]

,

,

Torsten Hoefler

CoRR, 2023

Data Center Ethernet and Remote Direct Memory Access: Issues at Hyperscale.

[DOI]

Torsten Hoefler

,

,

Keith D. Underwood

,

Robert Alverson

,

,

Vahid Tabatabaee

,

,

Surendra Anubolu

,

,

,

,

Computer, 2023

SAGE: Software-based Attestation for GPU Execution.

[DOI]

,

Benjamin Rothenberger

,

,

,

Torsten Hoefler

,

Proceedings of the 2023 USENIX Annual Technical Conference, 2023

In-network Allreduce with Multiple Spanning Trees on PolarFly.

[DOI]

Kartik Lakhotia

,

,

,

,

Torsten Hoefler

,

Fabrizio Petrini

Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023

Noise in the Clouds: Influence of Network Performance Variability on Application Scalability.

[DOI]

Daniele De Sensi

,

Tiziano De Matteis

,

Konstantin Taranov

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proceedings of the Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2023

FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs.

[DOI]

,

,

,

Alexandru Calotoiu

,

Alexandros Nikolaos Ziogas

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

Co-design Hardware and Algorithm for Vector Search.

[DOI]

,

,

,

Johannes de Fine Licht

,

,

,

Cédric Renggli

,

,

Theodoros Rekatsinas

,

Torsten Hoefler

,

Proceedings of the International Conference for High Performance Computing, 2023

HEAR: Homomorphically Encrypted Allreduce.

[DOI]

,

Mikhail Khalilov

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

VENOM: A Vectorized N: M Format for Unleashing the Power of Sparse Tensor Cores.

[DOI]

Roberto L. Castro

,

,

,

,

Basilio B. Fraguela

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

High-Performance and Programmable Attentional Graph Neural Networks with Global Tensor Formulations.

[DOI]

,

,

Robert Gerstenberger

,

Paolo Sylos Labini

,

Alexandros Nikolaos Ziogas

,

,

Lukas Gianinazzi

,

Florian Scheidl

,

,

,

,

Grzegorz Kwasniewski

,

Raghavendra Kanakagiri

,

,

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores.

[DOI]

,

Robert Gerstenberger

,

,

Michal Podstawski

,

,

,

George Mitenkov

,

Wojciech Chlapek

,

Marek T. Michalewicz

,

Hubert Niewiadomski

,

Jürgen Müller

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

A Reference Implementation for a Quantum Message Passing Interface.

[DOI]

,

,

Samuel A. Stein

,

,

,

Martin Roetteler

,

Torsten Hoefler

,

Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices.

[DOI]

,

,

Torsten Hoefler

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

HOT: Higher-Order Dynamic Graph Representation Learning With Efficient Transformers.

[DOI]

,

Afonso Claudino Catarino

,

Lukas Gianinazzi

,

,

,

Hubert Niewiadomski

,

Torsten Hoefler

Proceedings of the Learning on Graphs Conference, 27-30 November 2023, Virtual Event., 2023

rFaaS: Enabling High Performance Serverless with RDMA and Leases.

[DOI]

,

Konstantin Taranov

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization.

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the 37th International Conference on Supercomputing, 2023

FMI: Fast and Cheap Message Passing for Serverless Functions.

[DOI]

,

Roman Böhringer

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the 37th International Conference on Supercomputing, 2023

Compressing multidimensional weather and climate data into neural networks.

[DOI]

,

Torsten Hoefler

Proceedings of the Eleventh International Conference on Learning Representations, 2023

OPTQ: Accurate Quantization for Generative Pre-trained Transformers.

[DOI]

,

,

Torsten Hoefler

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Differentiable Transportation Pruning.

[DOI]

,

Jan C. van Gemert

,

Torsten Hoefler

,

,

Evangelos Eleftheriou

,

Bram-Ernst Verhoef

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Streaming Task Graph Scheduling for Dataflow Architectures.

[DOI]

Tiziano De Matteis

,

Lukas Gianinazzi

,

Johannes de Fine Licht

,

Torsten Hoefler

Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement.

[DOI]

,

,

Matheus A. Cavalcante

,

,

,

Torsten Hoefler

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Sparse Hamming Graph: A Customizable Network-on-Chip Topology.

[DOI]

,

,

Matheus A. Cavalcante

,

,

,

Torsten Hoefler

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Maximum Flows in Parametric Graph Templates.

[DOI]

,

Lukas Gianinazzi

,

Torsten Hoefler

,

Proceedings of the Algorithms and Complexity - 13th International Conference, 2023

Bridging Control-Centric and Data-Centric Optimization.

[DOI]

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

User-guided Page Merging for Memory Deduplication in Serverless Systems.

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the IEEE International Conference on Big Data, 2023

2022

QIRO: A Static Single Assignment-based Quantum Program Representation for Optimization.

[DOI]

,

,

Vadym Kliuchnikov

,

Torsten Hoefler

ACM Trans. Quantum Comput., September, 2022

Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration.

[DOI]

,

,

Torsten Hoefler

,

Paolo Bientinesi

,

Benjamin Berkels

IEEE Trans. Parallel Distributed Syst., 2022

Python FPGA Programming with Data-Centric Multi-Level Design.

[DOI]

Johannes de Fine Licht

,

Tiziano De Matteis

,

,

,

,

,

Carl-Johannes Johnsen

,

Torsten Hoefler

CoRR, 2022

Efficient RDMA Communication Protocols.

[DOI]

Konstantin Taranov

,

,

Torsten Hoefler

CoRR, 2022

Assessing requirements to scale to practical quantum advantage.

[DOI]

Michael E. Beverland

,

,

Matthias Troyer

,

Krysta M. Svore

,

Torsten Hoefler

,

Vadym Kliuchnikov

,

,

,

Aarthi Sundaram

,

Alexander Vaschillo

CoRR, 2022

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.

[DOI]

,

,

Torsten Hoefler

,

CoRR, 2022

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecast.

[DOI]

,

,

,

,

,

Lukas Gianinazzi

,

,

Torsten Hoefler

CoRR, 2022

Deinsum: Practically I/O Optimal Multilinear Algebra.

[DOI]

Alexandros Nikolaos Ziogas

,

Grzegorz Kwasniewski

,

,

,

Torsten Hoefler

CoRR, 2022

The spatial computer: A model for energy-efficient parallel computation.

[DOI]

Lukas Gianinazzi

,

,

,

,

Piotr Luczynski

,

Torsten Hoefler

CoRR, 2022

FaasKeeper: a Blueprint for Serverless Services.

[DOI]

,

Alexandru Calotoiu

,

Konstantin Taranov

,

Torsten Hoefler

CoRR, 2022

The Convergence of Hyperscale Data Center and High-Performance Computing Networks.

[DOI]

Torsten Hoefler

,

,

Computer, 2022

Benchmarking Data Science: 12 Ways to Lie With Statistics and Performance on Parallel Computers.

[DOI]

Torsten Hoefler

Computer, 2022

The Red-Blue Pebble Game on Trees and DAGs with Large Input.

[DOI]

,

Torsten Hoefler

Proceedings of the Structural Information and Communication Complexity, 2022

KafkaDirect: Zero-copy Data Access for Apache Kafka over RDMA Networks.

[DOI]

Konstantin Taranov

,

,

Virendra J. Marathe

,

Torsten Hoefler

Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Deinsum: Practically I/O Optimal Multi-Linear Algebra.

[DOI]

Alexandros Nikolaos Ziogas

,

Grzegorz Kwasniewski

,

,

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Boosting Performance Optimization with Interactive Data Movement Visualization.

[DOI]

,

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Efficient Quantized Sparse Matrix Operations on Tensor Cores.

[DOI]

,

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

PolarFly: A Cost-Effective and Flexible Low-Diameter Topology.

[DOI]

Kartik Lakhotia

,

,

,

,

,

Torsten Hoefler

,

Fabrizio Petrini

Proceedings of the SC22: International Conference for High Performance Computing, 2022

HammingMesh: A Network Topology for Large-Scale Deep Learning.

[DOI]

Torsten Hoefler

,

,

Daniele De Sensi

,

Salvatore Di Girolamo

,

,

,

,

,

,

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Building Blocks for Network-Accelerated Distributed File Systems.

[DOI]

Salvatore Di Girolamo

,

Daniele De Sensi

,

Konstantin Taranov

,

Milos Malesevic

,

,

,

Severin Kistler

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations.

[DOI]

,

Cesare Miglioli

,

Paolo Sylos Labini

,

,

,

Raghavendra Kanakagiri

,

,

,

Michal Podstawski

,

Grzegorz Kwasniewski

,

,

,

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Productive Performance Engineering for Weather and Climate Modeling with Python.

[DOI]

,

,

Florian Deconinck

,

,

,

,

,

,

Jeremy McGibbon

,

,

,

,

Thomas C. Schulthess

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Near-optimal sparse allreduce for distributed deep learning.

[DOI]

,

Torsten Hoefler

Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

Spatial Mixture-of-Experts.

[DOI]

,

Torsten Hoefler

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts.

[DOI]

,

,

,

,

,

Lukas Gianinazzi

,

,

Torsten Hoefler

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Neural Graph Databases.

[DOI]

,

,

Florian Scheidl

,

,

,

Michal Podstawski

,

,

Torsten Hoefler

Proceedings of the Learning on Graphs Conference, 2022

Motif Prediction with Graph Neural Networks.

[DOI]

,

,

Cesare Miglioli

,

,

Grzegorz Kwasniewski

,

,

Raghavendra Kanakagiri

,

,

Lukas Gianinazzi

,

,

Torsten Hoefler

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching.

[DOI]

András Strausz

,

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix Multiplication.

[DOI]

,

,

Torsten Hoefler

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Metamorphic Fuzzing of C++ Libraries.

[DOI]

,

Alastair F. Donaldson

,

,

Torsten Hoefler

Proceedings of the 15th IEEE Conference on Software Testing, Verification and Validation, 2022

Performance-detective: automatic deduction of cheap and accurate performance models.

[DOI]

,

,

Alexandru Calotoiu

,

,

,

,

,

Torsten Hoefler

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

A data-centric optimization framework for machine learning.

[DOI]

,

,

,

,

,

Torsten Hoefler

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Lifting C semantics for dataflow optimization.

[DOI]

Alexandru Calotoiu

,

,

Grzegorz Kwasniewski

,

Johannes de Fine Licht

,

,

,

Torsten Hoefler

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Neural Parameter Allocation Search.

[DOI]

Bryan A. Plummer

,

,

,

Torsten Hoefler

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping.

[DOI]

Carl-Johannes Johnsen

,

Tiziano De Matteis

,

,

Johannes de Fine Licht

,

Torsten Hoefler

Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

Fast Arbitrary Precision Floating Point on FPGA.

[DOI]

Johannes de Fine Licht

,

Christopher A. Pattison

,

Alexandros Nikolaos Ziogas

,

David Simmons-Duffin

,

Torsten Hoefler

Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

Accelerating Data Serialization/Deserialization Protocols with In-Network Compute.

[DOI]

,

Salvatore Di Girolamo

,

Torsten Hoefler

Proceedings of the IEEE/ACM International Workshop on Exascale MPI, 2022

RED-SEA: Network Solution for Exascale Architectures.

[DOI]

Andrea Biagioni

,

,

Ottorino Frezza

,

Francesca Lo Cicero

,

Alessandro Lonardo

,

Michele Martinelli

,

Pier Stanislao Paolucci

,

Elena Pastorelli

,

Francesco Simula

,

Matteo Turisini

,

,

Roberto Ammendola

,

Pascale Bernier-Bruna

,

,

,

,

Pierre-Axel Lagadec

,

Gregoire Pichon

,

,

Gaetan De Gassowski

,

Matthieu Hautreaux

,

Stephane Mathieu

,

,

,

,

Torsten Hoefler

,

,

,

Giuseppe Piero Brandino

,

Francesco De Giorgi

,

,

Iakovos Mavroidis

,

Yannis Papaefstathiou

,

Nikolaos Tampouratzis

,

Benjamin Kalisch

,

Ulrich Krackhardt

,

Mondrian Nuessle

,

Pantelis Xirouchakis

,

Vangelis Mageiropoulos

,

Michalis Gianioudis

,

,

Aggelos Ioannou

,

Nikos Kallimanis

,

,

Manolis Katevenis

,

Wolfgang Frings

,

Dominik Gottwald

,

Felime Guimaraes

,

,

,

,

,

,

,

Jennifer Lopez Barillao

,

,

,

Francisco J. Alfaro

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

José L. Sánchez

,

Adrián Castelló

,

,

María Engracia Gómez

,

Enrique S. Quintana-Ortí

,

Julio Sahuquillo

,

Eugenio Stabile

Proceedings of the 25th Euromicro Conference on Digital System Design, 2022

Circuits for Measurement Based Quantum State Preparation.

[DOI]

,

Torsten Hoefler

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

A RDMA Interface for Ultra-Fast Ultrasound Data-Streaming over an Optical Link.

[DOI]

Andrea Cossettini

,

Konstantin Taranov

,

,

,

Torsten Hoefler

,

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

NeVerMore: Exploiting RDMA Mistakes in NVMe-oF Storage Applications.

[DOI]

Konstantin Taranov

,

Benjamin Rothenberger

,

Daniele De Sensi

,

,

Torsten Hoefler

Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022

2021

Transformations of High-Level Synthesis Codes for High-Performance Computing.

[DOI]

Johannes de Fine Licht

,

,

Simon Meierhans

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2021

Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging.

[DOI]

,

,

Giorgi Nadiradze

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2021

High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks.

[DOI]

,

,

Marcel Schneider

,

Marek Konieczny

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2021

Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads.

[DOI]

,

,

Torsten Hoefler

,

IEEE Trans. Computers, 2021

Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores.

[DOI]

,

,

Torsten Hoefler

,

IEEE Trans. Computers, 2021

Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation.

[DOI]

,

Christoph Müller

,

Oleksandr Zinenko

,

,

,

,

,

Torsten Hoefler

,

ACM Trans. Archit. Code Optim., 2021

Communication Lower Bounds of Bilinear Algorithms for Symmetric Tensor Contractions.

[DOI]

Edgar Solomonik

,

,

Torsten Hoefler

SIAM J. Sci. Comput., 2021

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.

[DOI]

,

Zur Vonarburg-Shmaria

,

Yannick Schaffner

,

Leonardo Schwarz

,

Grzegorz Kwasniewski

,

Lukas Gianinazzi

,

,

,

Tobias Holenstein

,

Sebastian Leisinger

,

Peter Tatkowski

,

,

,

,

Philipp Lindenberger

,

Marek Konieczny

,

,

Torsten Hoefler

Proc. VLDB Endow., 2021

FPL: fast Presburger arithmetic through transprecision.

[DOI]

Arjun Pitchanathan

,

Christian Ulmann

,

,

Torsten Hoefler

,

Proc. ACM Program. Lang., 2021

The digital revolution of Earth-system science.

[DOI]

,

Peter D. Düben

,

Torsten Hoefler

,

,

Thomas C. Schulthess

,

Nat. Comput. Sci., 2021

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks.

[DOI]

Torsten Hoefler

,

,

,

,

Alexandra Peste

J. Mach. Learn. Res., 2021

RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing.

[DOI]

,

Konstantin Taranov

,

Alexandru Calotoiu

,

Torsten Hoefler

CoRR, 2021

Learning Combinatorial Node Labeling Algorithms.

[DOI]

Lukas Gianinazzi

,

Maximilian Fries

,

,

,

,

Torsten Hoefler

CoRR, 2021

Towards Million-Server Network Simulations on Just a Laptop.

[DOI]

,

Marcel Schneider

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

CoRR, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.

[DOI]

,

Raghavendra Kanakagiri

,

Grzegorz Kwasniewski

,

Rachata Ausavarungnirun

,

,

Konstantinos Kanellopoulos

,

,

Zur Vonarburg-Shmaria

,

Lukas Gianinazzi

,

,

Juan Gómez-Luna

,

,

Lukas Kapp-Schwoerer

,

Salvatore Di Girolamo

,

Marek Konieczny

,

,

Torsten Hoefler

CoRR, 2021

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.

[DOI]

,

Zur Vonarburg-Shmaria

,

Yannick Schaffner

,

Leonardo Schwarz

,

Grzegorz Kwasniewski

,

Lukas Gianinazzi

,

,

,

Tobias Holenstein

,

Sebastian Leisinger

,

Peter Tatkowski

,

,

,

,

Philipp Lindenberger

,

,

Marek Konieczny

,

,

Torsten Hoefler

CoRR, 2021

Enabling Dataflow Optimization for Quantum Programs.

[DOI]

,

,

Vadym Kliuchnikov

,

Torsten Hoefler

CoRR, 2021

ReDMArk: Bypassing RDMA Security Mechanisms.

[DOI]

Benjamin Rothenberger

,

Konstantin Taranov

,

,

Torsten Hoefler

Proceedings of the 30th USENIX Security Symposium, 2021

Naos: Serialization-free RDMA networking in Java.

[DOI]

Konstantin Taranov

,

,

,

Torsten Hoefler

Proceedings of the 2021 USENIX Annual Technical Conference, 2021

MigrOS: Transparent Live-Migration Support for Containerised RDMA Applications.

[DOI]

,

,

Leo Sahaya Daphne Antony

,

Torsten Hoefler

,

Hermann Härtig

Proceedings of the 2021 USENIX Annual Technical Conference, 2021

Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs.

[DOI]

Grzegorz Kwasniewski

,

,

Lukas Gianinazzi

,

Alexandru Calotoiu

,

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

Parallel Algorithms for Finding Large Cliques in Sparse Graphs.

[DOI]

Lukas Gianinazzi

,

,

Yannick Schaffner

,

Torsten Hoefler

Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

CoRM: Compactable Remote Memory over RDMA.

[DOI]

Konstantin Taranov

,

Salvatore Di Girolamo

,

Torsten Hoefler

Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Productivity, portability, performance: data-centric Python.

[DOI]

Alexandros Nikolaos Ziogas

,

,

,

Alexandru Calotoiu

,

Tiziano De Matteis

,

Johannes de Fine Licht

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

Flare: flexible in-network allreduce.

[DOI]

Daniele De Sensi

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations.

[DOI]

Grzegorz Kwasniewski

,

,

,

Alexandros Nikolaos Ziogas

,

Jens Eirik Saethre

,

André Gaillard

,

,

,

Anton Kozhevnikov

,

Joost VandeVondele

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

Distributed quantum computing with QMPI.

[DOI]

,

Damian S. Steiger

,

Torsten Hoefler

,

Matthias Troyer

Proceedings of the International Conference for High Performance Computing, 2021

Clairvoyant prefetching for distributed machine learning I/O.

[DOI]

,

Roman Böhringer

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

Chimera: efficiently training large-scale neural networks with bidirectional pipelines.

[DOI]

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization.

[DOI]

Grzegorz Kwasniewski

,

,

Alexandros Nikolaos Ziogas

,

,

,

Torsten Hoefler

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Extracting clean performance models from tainted programs.

[DOI]

,

Alexandru Calotoiu

,

,

,

,

Torsten Hoefler

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Data Movement Is All You Need: A Case Study on Optimizing Transformers.

[DOI]

,

,

,

,

Torsten Hoefler

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

SeBS: a serverless benchmark suite for function-as-a-service computing.

[DOI]

,

Grzegorz Kwasniewski

,

,

Michal Podstawski

,

Torsten Hoefler

Proceedings of the Middleware '21: 22nd International Middleware Conference, Québec City, Canada, December 6, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.

[DOI]

,

Raghavendra Kanakagiri

,

Grzegorz Kwasniewski

,

Rachata Ausavarungnirun

,

,

Konstantinos Kanellopoulos

,

,

Zur Vonarburg-Shmaria

,

Lukas Gianinazzi

,

,

Juan Gómez-Luna

,

Jakub Golinowski

,

,

Lukas Kapp-Schwoerer

,

Salvatore Di Girolamo

,

,

Marek Konieczny

,

,

Torsten Hoefler

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

A RISC-V in-network accelerator for flexible high-performance low-power packet processing.

[DOI]

Salvatore Di Girolamo

,

,

Alexandru Calotoiu

,

,

,

,

,

Torsten Hoefler

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Noise-Resilient Empirical Performance Modeling with Deep Neural Networks.

[DOI]

,

Alexander Geiß

,

Johannes Wehrstein

,

Alexandru Calotoiu

,

Thorsten Reimann

,

Torsten Hoefler

,

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

NPBench: a benchmarking suite for high-performance NumPy.

[DOI]

Alexandros Nikolaos Ziogas

,

,

,

Torsten Hoefler

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations.

[DOI]

,

Zacharias V. Fisches

,

,

Torsten Hoefler

,

Michael F. P. O'Boyle

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra.

[DOI]

,

,

,

Torsten Hoefler

,

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

An Efficient Algorithm for Sparse Quantum State Preparation.

[DOI]

,

Torsten Hoefler

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems.

[DOI]

Johannes de Fine Licht

,

,

Tiziano De Matteis

,

,

,

Torsten Hoefler

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

Hermes: Enabling efficient large-scale simulation in MATSim.

[DOI]

,

,

Joschka Bischoff

,

,

Wolfgang Scherr

,

Torsten Hoefler

,

Proceedings of the 12th International Conference on Ambient Systems, 2021

2020

ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications.

[DOI]

Alexandru Calotoiu

,

,

Torsten Hoefler

,

,

,

Proceedings of the Software for Exascale Computing - SPPEXA 2016-2019, 2020

Substream-Centric Maximum Matchings on FPGA.

[DOI]

,

,

,

Dimitri Stanojevic

,

Johannes de Fine Licht

,

Torsten Hoefler

ACM Trans. Reconfigurable Technol. Syst., 2020

Polyhedral Compilation for Racetrack Memories.

[DOI]

,

,

,

Torsten Hoefler

,

Jerónimo Castrillón

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Dawn: a High-level Domain-Specific Language Compiler Toolchain for Weather and Climate Applications.

[DOI]

,

,

Fabian Thuering

,

Torsten Hoefler

,

Supercomput. Front. Innov., 2020

Special issue: Selected papers from EuroMPI 2019.

[DOI]

Jesper Larsson Träff

,

Torsten Hoefler

Parallel Comput., 2020

Assertion-based optimization of Quantum programs.

[DOI]

,

Torsten Hoefler

,

Matthias Troyer

Proc. ACM Program. Lang., 2020

Fast linear programming through transprecision computing on small and sparse data.

[DOI]

,

Theodoros Theodoridis

,

Maximilian Falkenstein

,

Arjun Pitchanathan

,

,

,

,

Torsten Hoefler

Proc. ACM Program. Lang., 2020

Deep Data Flow Analysis.

[DOI]

,

,

Zacharias V. Fisches

,

,

Torsten Hoefler

,

Michael F. P. O'Boyle

CoRR, 2020

Parametric Graph Templates: Properties and Algorithms.

[DOI]

,

Lukas Gianinazzi

,

Torsten Hoefler

,

CoRR, 2020

PsPIN: A high-performance low-power architecture for flexible in-network compute.

[DOI]

Salvatore Di Girolamo

,

,

Alexandru Calotoiu

,

,

,

,

,

Torsten Hoefler

CoRR, 2020

TardiS: Migrating Containers with RDMA Networks.

[DOI]

,

,

Leo Sahaya Daphne Antony

,

Torsten Hoefler

,

Hermann Härtig

CoRR, 2020

High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers.

[DOI]

,

,

Marcel Schneider

,

Marek Konieczny

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

CoRR, 2020

Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning.

[DOI]

Bryan A. Plummer

,

,

,

Torsten Hoefler

,

CoRR, 2020

Domain-Specific Multi-Level IR Rewriting for GPU.

[DOI]

,

Christoph Müller

,

Oleksandr Zinenko

,

,

,

,

,

Torsten Hoefler

,

CoRR, 2020

Deep Learning for Post-Processing Ensemble Weather Forecasts.

[DOI]

Peter Grönquist

,

,

,

,

,

,

Torsten Hoefler

CoRR, 2020

Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.

[DOI]

,

,

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

CoRR, 2020

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis.

[DOI]

,

Zacharias V. Fisches

,

,

Torsten Hoefler

,

CoRR, 2020

Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads.

[DOI]

,

,

Torsten Hoefler

,

CoRR, 2020

sRDMA - Efficient NIC-based Authentication and Encryption for Remote Direct Memory Access.

[DOI]

Konstantin Taranov

,

Benjamin Rothenberger

,

,

Torsten Hoefler

Proceedings of the 2020 USENIX Annual Technical Conference, 2020

Parallel Planar Subgraph Isomorphism and Vertex Connectivity.

[DOI]

Lukas Gianinazzi

,

Torsten Hoefler

Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020

An in-depth analysis of the slingshot interconnect.

[DOI]

Daniele De Sensi

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2020

fBLAS: streaming linear algebra on FPGA.

[DOI]

Tiziano De Matteis

,

Johannes de Fine Licht

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2020

ScalAna: automating scaling loss detection with graph analysis.

[DOI]

,

,

,

,

Torsten Hoefler

,

,

Proceedings of the International Conference for High Performance Computing, 2020

Empirical Modeling of Spatially Diverging Performance.

[DOI]

Alexandru Calotoiu

,

Markus Geisenhofer

,

,

,

,

Torsten Hoefler

,

Martin Oberlack

,

Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, 2020

FatPaths: routing in supercomputers and data centers when shortest paths fall short.

[DOI]

,

Marcel Schneider

,

Marek Konieczny

,

,

Erik Henriksson

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2020

High-performance parallel graph coloring with strong guarantees on work, depth, and quality.

[DOI]

,

,

,

Zur Vonarburg-Shmaria

,

Lukas Gianinazzi

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2020

Communication and Timing Issues with MPI Virtualization.

[DOI]

,

,

,

Torsten Hoefler

Proceedings of the EuroMPI/USA '20: 27th European MPI Users' Group Meeting, 2020

Taming unbalanced training workloads in deep learning with partial collective operations.

[DOI]

,

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Identifying scalability bottlenecks for large-scale parallel programs with graph analysis.

[DOI]

,

,

,

Torsten Hoefler

,

,

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling.

[DOI]

,

Alexandru Calotoiu

,

Sebastian Rinke

,

Thorsten Reimann

,

Torsten Hoefler

,

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons.

[DOI]

,

Raghavendra Kanakagiri

,

,

Mikhail Karasikov

,

,

Torsten Hoefler

,

Edgar Solomonik

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis.

[DOI]

Johannes de Fine Licht

,

Grzegorz Kwasniewski

,

Torsten Hoefler

Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor.

[DOI]

,

,

,

Torsten Hoefler

,

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Augment Your Batch: Improving Generalization Through Instance Repetition.

[DOI]

,

,

,

,

Torsten Hoefler

,

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations.

[DOI]

,

,

Alexandru Calotoiu

,

Torsten Hoefler

,

Alexandre Strube

,

IEEE Trans. Parallel Distributed Syst., 2019

Strong consistency is not hard to get: Two-Phase Locking and Two-Phase Commit on Thousands of Cores.

[DOI]

Claude Barthels

,

,

Konstantin Taranov

,

,

Torsten Hoefler

Proc. VLDB Endow., 2019

Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis.

[DOI]

,

Torsten Hoefler

ACM Comput. Surv., 2019

Reflecting on the Goal and Baseline for Exascale Computing: A Roadmap Based on Weather and Climate Simulations.

[DOI]

Thomas C. Schulthess

,

,

,

,

Torsten Hoefler

,

Christoph M. Schär

Comput. Sci. Eng., 2019

Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism.

[DOI]

,

,

Vasiliki Kalavri

,

Michael Kapralov

,

Torsten Hoefler

CoRR, 2019

A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations.

[DOI]

Alexandros Nikolaos Ziogas

,

,

Guillermo Indalecio Fernández

,

,

Mathieu Luisier

,

Torsten Hoefler

CoRR, 2019

Predicting Weather Uncertainty with Deep Convnets.

[DOI]

Peter Grönquist

,

,

,

,

,

,

Torsten Hoefler

CoRR, 2019

hlslib: Software Engineering for Hardware Design.

[DOI]

Johannes de Fine Licht

,

Torsten Hoefler

CoRR, 2019

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency.

[DOI]

,

Berry Weinstein

,

,

,

Torsten Hoefler

,

CoRR, 2019

FatPaths: Routing in Supercomputers, Data Centers, and Clouds with Low-Diameter Networks when Shortest Paths Fall Short.

[DOI]

,

Marcel Schneider

,

,

Marek Konieczny

,

Erik Henriksson

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

CoRR, 2019

Graph Processing on FPGAs: Taxonomy, Survey, Challenges.

[DOI]

,

Dimitri Stanojevic

,

Johannes de Fine Licht

,

,

Torsten Hoefler

CoRR, 2019

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs.

[DOI]

,

Johannes de Fine Licht

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

CoRR, 2019

Augment your batch: better training with larger batches.

[DOI]

,

,

,

,

Torsten Hoefler

,

CoRR, 2019

Head-of-line blocking avoidance in Slim Fly networks using deadlock-free non-minimal and adaptive routing.

[DOI]

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

Torsten Hoefler

Concurr. Comput. Pract. Exp., 2019

Optimizing the data movement in quantum transport simulations via data-centric parallel programming.

[DOI]

Alexandros Nikolaos Ziogas

,

,

Guillermo Indalecio Fernández

,

,

Mathieu Luisier

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

A data-centric approach to extreme-scale <i>ab initio</i> dissipative quantum transport simulations.

[DOI]

Alexandros Nikolaos Ziogas

,

,

Guillermo Indalecio Fernández

,

,

Mathieu Luisier

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Mitigating network noise on Dragonfly networks through application-aware routing.

[DOI]

Daniele De Sensi

,

Salvatore Di Girolamo

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

SparCML: high-performance sparse communication for machine learning.

[DOI]

Cédric Renggli

,

,

Mehdi Aghagolzadeh

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Streaming message interface: high-performance distributed memory programming on reconfigurable hardware.

[DOI]

Tiziano De Matteis

,

Johannes de Fine Licht

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication.

[DOI]

Grzegorz Kwasniewski

,

,

,

Joost VandeVondele

,

Raffaele Solcà

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Network-accelerated non-contiguous memory transfers.

[DOI]

Salvatore Di Girolamo

,

Konstantin Taranov

,

,

Michael Schaffner

,

,

,

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Slim graph: practical lossy graph compression for approximate graph processing, storage, and analytics.

[DOI]

,

,

Lukas Gianinazzi

,

Robert Gerstenberger

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures.

[DOI]

,

Johannes de Fine Licht

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Foreword EuroMPI 2019.

[DOI]

Jesper Larsson Träff

,

Torsten Hoefler

Proceedings of the 26th European MPI Users' Group Meeting, 2019

Corrected trees for reliable group communication.

[DOI]

Martin Küttler

,

,

,

Carsten Weinhold

,

Hermann Härtig

,

,

Torsten Hoefler

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

A fast analytical model of fully associative caches.

[DOI]

,

,

Laurin Brandner

,

Torsten Hoefler

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

Porting the COSMO Weather Model to Manycore CPUs.

[DOI]

,

Stefan Moosbrugger

,

,

,

,

Anton Afanasyev

,

,

,

Thomas C. Schulthess

,

Torsten Hoefler

Proceedings of the Platform for Advanced Scientific Computing Conference, 2019

Invited Talk 2.

[DOI]

Torsten Hoefler

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

SimFS: A Simulation Data Virtualizing File System Interface.

[DOI]

Salvatore Di Girolamo

,

,

Thomas C. Schulthess

,

Torsten Hoefler

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning.

[DOI]

,

,

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Using performance models to understand scalable Krylov solver performance at scale for structured grid problems.

[DOI]

,

Torsten Hoefler

,

Proceedings of the ACM International Conference on Supercomputing, 2019

Substream-Centric Maximum Matchings on FPGA.

[DOI]

,

,

,

Johannes de Fine Licht

,

Torsten Hoefler

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Embedding Functions Into Reversible Circuits: A Probabilistic Approach to the Number of Lines.

[DOI]

,

Frances Ann Hubis

,

Torsten Hoefler

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Absinthe: Learning an Analytical Performance Model to Fuse and Tile Stencil Codes in One Shot.

[DOI]

,

,

Torsten Hoefler

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Cache-Oblivious MPI All-to-All Communications Based on Morton Order.

[DOI]

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2018

Using Hoare logic for quantum circuit optimization.

[DOI]

,

Torsten Hoefler

,

Matthias Troyer

CoRR, 2018

Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations.

[DOI]

,

Torsten Hoefler

CoRR, 2018

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching.

[DOI]

,

,

Torsten Hoefler

,

Satoshi Matsuoka

CoRR, 2018

SparCML: High-Performance Sparse Communication for Machine Learning.

[DOI]

Cédric Renggli

,

,

Torsten Hoefler

CoRR, 2018

Automatic Verification of RMA Programs via Abstraction Extrapolation.

[DOI]

,

Andrei Marian Dan

,

,

Torsten Hoefler

,

Martin T. Vechev

Proceedings of the Verification, Model Checking, and Abstract Interpretation, 2018

ShenTu: processing multi-trillion edge graphs on millions of cores in seconds.

[DOI]

,

,

,

,

,

,

,

Torsten Hoefler

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2018

Designing scalable FPGA architectures using high-level synthesis.

[DOI]

Johannes de Fine Licht

,

,

Torsten Hoefler

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Communication-avoiding parallel minimum cuts and connected components.

[DOI]

Lukas Gianinazzi

,

,

Alessandro De Palma

,

,

Torsten Hoefler

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Neural Code Comprehension: A Learnable Representation of Code Semantics.

[DOI]

,

Alice Shoshana Jakobovits

,

Torsten Hoefler

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

The Convergence of Sparsified Gradient Methods.

[DOI]

,

Torsten Hoefler

,

Mikael Johansson

,

Nikola Konstantinov

,

,

Cédric Renggli

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Reproducible Floating-Point Aggregation in RDBMSs.

[DOI]

,

,

Torsten Hoefler

,

Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Fast and strongly-consistent per-item resilience in key-value stores.

[DOI]

Konstantin Taranov

,

,

Torsten Hoefler

Proceedings of the Thirteenth EuroSys Conference, 2018

Accelerating Deep Learning Frameworks with Micro-Batches.

[DOI]

,

,

Torsten Hoefler

,

Satoshi Matsuoka

Proceedings of the IEEE International Conference on Cluster Computing, 2018

Lightweight Requirements Engineering for Exascale Co-design.

[DOI]

Alexandru Calotoiu

,

,

Torsten Hoefler

,

,

Sebastian Rinke

,

Proceedings of the IEEE International Conference on Cluster Computing, 2018

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability.

[DOI]

,

Syed Minhaj Hassan

,

Sudhakar Yalamanchili

,

Rachata Ausavarungnirun

,

,

Torsten Hoefler

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Log(graph): a near-optimal high-performance graph representation.

[DOI]

,

Dimitri Stanojevic

,

,

,

Maurice Hoerold

,

Torsten Hoefler

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Trends in Data Locality Abstractions for HPC Systems.

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Distributed Join Algorithms on Thousands of Cores.

[DOI]

Claude Barthels

,

,

Torsten Hoefler

,

,

Proc. VLDB Endow., 2017

Designing Databases for Future High-Performance Networks.

[DOI]

Claude Barthels

,

,

Torsten Hoefler

IEEE Data Eng. Bull., 2017

A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem.

[DOI]

Edgar Solomonik

,

,

,

Torsten Hoefler

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

Scaling betweenness centrality using communication-efficient sparse matrix multiplication.

[DOI]

Edgar Solomonik

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2017

sPIN: high-performance streaming processing in the network.

[DOI]

Torsten Hoefler

,

Salvatore Di Girolamo

,

Konstantin Taranov

,

,

Proceedings of the International Conference for High Performance Computing, 2017

Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications.

[DOI]

,

Alexandru Calotoiu

,

Torsten Hoefler

,

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

POSTER: Cache-Oblivious MPI All-to-All Communications on Many-Core Architectures.

[DOI]

,

,

Torsten Hoefler

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations.

[DOI]

,

Edgar Solomonik

,

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

IPDRM Workshop Introduction.

[DOI]

Shuaiwen Leon Song

,

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL.

[DOI]

,

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems.

[DOI]

Torsten Hoefler

,

,

,

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

EMBRACE Keynote.

[DOI]

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Transparent Caching for RMA Systems.

[DOI]

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

SlimSell: A Vectorizable Graph Representation for Breadth-First Search.

[DOI]

,

Florian Marending

,

Edgar Solomonik

,

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Model-Driven Choice of Numerical Methods for the Solution of the Linear Advection Equation.

[DOI]

,

,

Torsten Hoefler

,

Thomas C. Schulthess

Proceedings of the International Conference on Computational Science, 2017

AllConcur: Leaderless Concurrent Atomic Broadcast.

[DOI]

,

Torsten Hoefler

,

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations.

[DOI]

,

Michal Podstawski

,

,

Edgar Solomonik

,

Torsten Hoefler

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

An Effective Queuing Scheme to Provide Slim Fly Topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing.

[DOI]

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

Torsten Hoefler

Proceedings of the 3rd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2017

Improving Non-minimal and Adaptive Routing Algorithms in Slim Fly Networks.

[DOI]

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

Torsten Hoefler

Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017

Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches.

[DOI]

,

,

,

Keith D. Underwood

,

Torsten Hoefler

Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017

Multi-agent Pathfinding with n Agents on Graphs with n Vertices: Combinatorial Classification and Tight Algorithmic Bounds.

[DOI]

Klaus-Tycho Foerster

,

,

Torsten Hoefler

,

,

,

Roger Wattenhofer

Proceedings of the Algorithms and Complexity - 10th International Conference, 2017

2016

Automatic Performance Modeling of HPC Applications.

[DOI]

,

Christian H. Bischof

,

Alexandru Calotoiu

,

Torsten Hoefler

,

Christian Iwainsky

,

Grzegorz Kwasniewski

,

,

,

Alexandre Strube

,

,

Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Cache Line Aware Algorithm Design for Cache-Coherent Architectures.

[DOI]

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2016

On noise and the performance benefit of nonblocking collectives.

[DOI]

Patrick M. Widener

,

,

Kurt B. Ferreira

,

Torsten Hoefler

Int. J. High Perform. Comput. Appl., 2016

Betweenness Centrality is more Parallelizable than Dense Matrix Multiplication.

[DOI]

Edgar Solomonik

,

,

,

Torsten Hoefler

CoRR, 2016

AllConcur: Leaderless Concurrent Atomic Broadcast (Extended Version).

[DOI]

,

Torsten Hoefler

,

CoRR, 2016

Extreme scale plasma turbulence simulations on top supercomputers worldwide.

[DOI]

William M. Tang

,

,

Stéphane Ethier

,

Grzegorz Kwasniewski

,

Torsten Hoefler

,

Khaled Z. Ibrahim

,

,

Samuel Williams

,

,

Carlos Rosales-Fernandez

,

Timothy J. Williams

Proceedings of the International Conference for High Performance Computing, 2016

A PCIe congestion-aware performance model for densely populated accelerator servers.

[DOI]

Maxime Martinasso

,

Grzegorz Kwasniewski

,

,

Thomas C. Schulthess

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2016

dCUDA: hardware supported overlap of computation and communication.

[DOI]

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2016

Scheduling-aware routing for supercomputers.

[DOI]

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2016

Selecting Technical Papers for an Interdisciplinary Conference: The PASC Review Process.

[DOI]

Torsten Hoefler

Proceedings of the Platform for Advanced Scientific Computing Conference, 2016

Modeling and analysis of remote memory access programming.

[DOI]

Andrei Marian Dan

,

,

Torsten Hoefler

,

Martin T. Vechev

Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, 2016

Polly-ACC Transparent compilation to heterogeneous hardware.

[DOI]

,

Torsten Hoefler

Proceedings of the 2016 International Conference on Supercomputing, 2016

SDNsec: Forwarding Accountability for the SDN Data Plane.

[DOI]

Takayuki Sasaki

,

Christos Pappas

,

,

Torsten Hoefler

,

Proceedings of the 25th International Conference on Computer Communication and Networks, 2016

High-Performance Distributed RMA Locks.

[DOI]

,

,

Torsten Hoefler

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing.

[DOI]

,

Torsten Hoefler

,

Satoshi Matsuoka

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Ensuring Deadlock-Freedom in Low-Diameter InfiniBand Networks.

[DOI]

,

,

Torsten Hoefler

Proceedings of the 24th IEEE Annual Symposium on High-Performance Interconnects, 2016

Fast Multi-parameter Performance Modeling.

[DOI]

Alexandru Calotoiu

,

David Beckingsale

,

Christopher W. Earl

,

Torsten Hoefler

,

,

,

Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

2015

Remote Memory Access Programming in MPI-3.

[DOI]

Torsten Hoefler

,

,

,

,

,

,

Keith D. Underwood

ACM Trans. Parallel Comput., 2015

Introduction to the Special Issue on SPAA 2013.

[DOI]

,

Torsten Hoefler

ACM Trans. Parallel Comput., 2015

Sparse Tensor Algebra as a Parallel Programming Model.

[DOI]

Edgar Solomonik

,

Torsten Hoefler

CoRR, 2015

Cost-effective diameter-two topologies: analysis and evaluation.

[DOI]

Georgios Kathareios

,

Cyriel Minkenberg

,

Bogdan Prisacari

,

Germán Rodríguez

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2015

Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results.

[DOI]

Torsten Hoefler

,

Proceedings of the International Conference for High Performance Computing, 2015

HIPS-LSPP Keynotes.

[DOI]

Torsten Hoefler

,

Laxmikant V. Kalé

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization.

[DOI]

,

Torsten Hoefler

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Exascaling Your Library: Will Your Implementation Meet Your Expectations?

[DOI]

,

Alexandru Calotoiu

,

Torsten Hoefler

,

Alexandre Strube

,

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures.

[DOI]

,

,

Torsten Hoefler

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations.

[DOI]

,

Torsten Hoefler

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Cache Line Aware Optimizations for ccNUMA Systems.

[DOI]

,

Torsten Hoefler

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

DARE: High-Performance State Machine Replication on RDMA Networks.

[DOI]

,

Torsten Hoefler

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages.

[DOI]

,

Torsten Hoefler

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

Distributing the Data Plane for Remote Storage Access.

[DOI]

Torsten Hoefler

,

,

Proceedings of the 15th Workshop on Hot Topics in Operating Systems, 2015

Exploiting Offload Enabled Network Interfaces.

[DOI]

Salvatore Di Girolamo

,

,

Keith D. Underwood

,

Torsten Hoefler

Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

Source-Based Path Selection: The Data Plane Perspective.

[DOI]

,

Christos Pappas

,

Cristina Basescu

,

,

Torsten Hoefler

,

Proceedings of the 10th International Conference on Future Internet, 2015

Evaluating the Cost of Atomic Operations on Modern Architectures.

[DOI]

Hermann Schweizer

,

,

Torsten Hoefler

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Using Compiler Techniques to Improve Automatic Performance Modeling.

[DOI]

Arnamoy Bhattacharyya

,

Grzegorz Kwasniewski

,

Torsten Hoefler

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations.

[DOI]

Torsten Hoefler

,

Supercomput. Front. Innov., 2014

Application-oriented ping-pong benchmarking: how to assess the real communication overheads.

[DOI]

,

Robert Gerstenberger

,

Torsten Hoefler

Computing, 2014

Improved MPI collectives for MPI processes in shared address spaces.

[DOI]

,

Torsten Hoefler

,

,

Clust. Comput., 2014

Automatic complexity analysis of explicitly parallel programs.

[DOI]

Torsten Hoefler

,

Grzegorz Kwasniewski

Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014

Understanding the Effects of Communication and Coordination on Checkpointing at Scale.

[DOI]

Kurt B. Ferreira

,

Patrick M. Widener

,

,

Dorian C. Arnold

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2014

Fail-in-Place Network Design: Interaction Between Topology, Routing Algorithm and Failures.

[DOI]

,

Torsten Hoefler

,

Satoshi Matsuoka

Proceedings of the International Conference for High Performance Computing, 2014

Slim Fly: A Cost Effective Low-Diameter Network Topology.

[DOI]

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2014

Exploring the effect of noise on the performance benefit of nonblocking allreduce.

[DOI]

Patrick M. Widener

,

Kurt B. Ferreira

,

,

Torsten Hoefler

Proceedings of the 21st European MPI Users' Group Meeting, 2014

Designing Bit-Reproducible Portable High-Performance Applications.

[DOI]

,

,

Torsten Hoefler

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks.

[DOI]

Bogdan Prisacari

,

Germán Rodríguez

,

Philip Heidelberger

,

,

Cyriel Minkenberg

,

Torsten Hoefler

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Fault tolerance for remote memory access programming models.

[DOI]

,

Torsten Hoefler

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Catwalk: A Quick Development Path for Performance Models.

[DOI]

,

Christian H. Bischof

,

Torsten Hoefler

,

,

,

Alexandru Calotoiu

,

Christian Iwainsky

,

Alexandre Strube

,

Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

PEMOGEN: automatic adaptive performance modeling during program runtime.

[DOI]

Arnamoy Bhattacharyya

,

Torsten Hoefler

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Fast pattern-specific routing for fat tree networks.

[DOI]

Bogdan Prisacari

,

Germán Rodríguez

,

Cyriel Minkenberg

,

Torsten Hoefler

ACM Trans. Archit. Code Optim., 2013

Operating systems and runtime environments on supercomputers.

[DOI]

Torsten Hoefler

,

Int. J. High Perform. Comput. Appl., 2013

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory.

[DOI]

Torsten Hoefler

,

,

Darius Buntinas

,

,

,

,

,

,

Computing, 2013

Using Simulation to Evaluate the Performance of Resilience Strategies at Scale.

[DOI]

,

,

Kurt B. Ferreira

,

Dorian C. Arnold

,

Torsten Hoefler

,

Patrick M. Widener

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Enabling highly-scalable remote memory access programming with MPI-3 one sided.

[DOI]

Robert Gerstenberger

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2013

Hybrid MPI: efficient message passing for multi-core systems.

[DOI]

Andrew Friedley

,

Greg Bronevetsky

,

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the International Conference for High Performance Computing, 2013

Using automated performance modeling to find scalability bugs in complex codes.

[DOI]

Alexandru Calotoiu

,

Torsten Hoefler

,

,

Proceedings of the International Conference for High Performance Computing, 2013

MPI datatype processing using runtime compilation.

[DOI]

,

Fredrik Kjolstad

,

Torsten Hoefler

Proceedings of the 20th European MPI Users's Group Meeting, 2013

Ownership passing: efficient distributed memory programming on multi-core systems.

[DOI]

Andrew Friedley

,

Torsten Hoefler

,

Greg Bronevetsky

,

Andrew Lumsdaine

,

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Compiler Optimizations for Non-contiguous Remote Data Movement.

[DOI]

,

Robert Gerstenberger

,

Torsten Hoefler

Proceedings of the Languages and Compilers for Parallel Computing, 2013

Bandwidth-optimal all-to-all exchanges in fat tree networks.

[DOI]

Bogdan Prisacari

,

Germán Rodríguez

,

Cyriel Minkenberg

,

Torsten Hoefler

Proceedings of the International Conference on Supercomputing, 2013

Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters.

[DOI]

,

Torsten Hoefler

,

,

Brian W. Barrett

,

Proceedings of the 42nd International Conference on Parallel Processing, 2013

Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi.

[DOI]

,

Torsten Hoefler

Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

NUMA-aware shared-memory collective communication for MPI.

[DOI]

,

Torsten Hoefler

,

Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Topic 13: High-Performance Networks and Communication - (Introduction).

[DOI]

,

Torsten Hoefler

,

,

Davide Bertozzi

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012

Extensions for next-generation parallel programming models.

[DOI]

Torsten Hoefler

Parallel Comput., 2012

Top Picks from Hot Interconnects 2011: Petascale Network Architectures.

[DOI]

Torsten Hoefler

,

Patrick Geoffray

,

Fabrizio Petrini

,

Jesper Larsson Träff

IEEE Micro, 2012

Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications.

[DOI]

,

,

Torsten Hoefler

,

Bronis R. de Supinski

,

William D. Gropp

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Optimization principles for collective neighborhood communications.

[DOI]

Torsten Hoefler

,

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Micro-applications for Communication Data Access Patterns and MPI Datatypes.

[DOI]

,

Robert Gerstenberger

,

Torsten Hoefler

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Exact Dependence Analysis for Increased Communication Overlap.

[DOI]

Simone Pellegrini

,

Torsten Hoefler

,

Thomas Fahringer

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming.

[DOI]

Torsten Hoefler

,

,

Darius Buntinas

,

,

Brian W. Barrett

,

,

,

,

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Automatic datatype generation and optimization.

[DOI]

Fredrik Kjolstad

,

Torsten Hoefler

,

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Communication-centric optimizations by dynamically detecting collective operations.

[DOI]

Torsten Hoefler

,

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Assessing HPC Failure Detectors for MPI Jobs.

[DOI]

,

,

Torsten Hoefler

,

Proceedings of the 20th Euromicro International Conference on Parallel, 2012

On the Effects of CPU Caches on MPI Point-to-Point Communications.

[DOI]

Simone Pellegrini

,

Torsten Hoefler

,

Thomas Fahringer

Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Productive Parallel Linear Algebra Programming with Unstructured Topology Adaption.

[DOI]

Peter Gottschling

,

Torsten Hoefler

Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd.

[DOI]

,

Steven Gottlieb

,

Torsten Hoefler

Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

Runtime detection and optimization of collective communication patterns.

[DOI]

Torsten Hoefler

,

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Mpi on millions of Cores.

[DOI]

,

Darius Buntinas

,

,

,

Torsten Hoefler

,

,

,

,

Jesper Larsson Träff

Parallel Process. Lett., 2011

The scalable process topology interface of MPI 2.2.

[DOI]

Torsten Hoefler

,

Rolf Rabenseifner

,

Hubert Ritzdorf

,

Bronis R. de Supinski

,

,

Jesper Larsson Träff

Concurr. Comput. Pract. Exp., 2011

Methods of creating student cluster competition teams.

[DOI]

Stephen Lien Harrell

,

Preston M. Smith

,

,

Torsten Hoefler

,

Anna A. Labutina

,

Trinity Overmyer

Proceedings of the 2011 TeraGrid Conference - Extreme Digital Discovery, 2011

Performance modeling for systematic performance tuning.

[DOI]

Torsten Hoefler

,

,

,

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Design and Evaluation of Nonblocking Collective I/O Operations.

[DOI]

Vishwanath Venkatesan

,

Mohamad Chaarawi

,

,

Torsten Hoefler

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Writing Parallel Libraries with MPI - Common Practice, Issues, and Extensions.

[DOI]

Torsten Hoefler

,

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Performance Expectations and Guidelines for MPI Derived Datatypes.

[DOI]

,

Torsten Hoefler

,

,

Jesper Larsson Träff

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Active pebbles: a programming model for highly parallel fine-grained data-driven computations.

[DOI]

Jeremiah Willcock

,

Torsten Hoefler

,

Nicholas Gerard Edmonds

,

Andrew Lumsdaine

Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

Kanor - A Declarative Language for Explicit Communication.

[DOI]

,

William E. Byrd

,

Jeremiah Willcock

,

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the Practical Aspects of Declarative Languages, 2011

HIPS Introduction.

[DOI]

Torsten Hoefler

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Deadlock-Free Oblivious Routing for Arbitrary Topologies.

[DOI]

,

Torsten Hoefler

,

Wolfgang E. Nagel

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Active pebbles: parallel programming for data-driven applications.

[DOI]

Jeremiah Willcock

,

Torsten Hoefler

,

Nicholas Gerard Edmonds

,

Andrew Lumsdaine

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Generic topology mapping strategies for large-scale parallel architectures.

[DOI]

Torsten Hoefler

,

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Kernel-Based Offload of Collective Operations - Implementation, Evaluation and Lessons Learned.

[DOI]

,

,

Torsten Hoefler

,

Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2010

Accurately measuring overhead, communication time and progression of blocking and nonblocking collective operations at massive scale.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Int. J. Parallel Emergent Distributed Syst., 2010

Software and Hardware Techniques for Power-Efficient HPC Networking.

[DOI]

Torsten Hoefler

Comput. Sci. Eng., 2010

Characterizing the Influence of System Noise on Large-Scale Applications by Simulation.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the Conference on High Performance Computing Networking, 2010

Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues.

[DOI]

Torsten Hoefler

,

,

,

Jesper Larsson Träff

Proceedings of the Recent Advances in the Message Passing Interface, 2010

Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient Using MPI Datatypes.

[DOI]

Torsten Hoefler

,

Steven Gottlieb

Proceedings of the Recent Advances in the Message Passing Interface, 2010

Efficient MPI Support for Advanced Hybrid Programming Models.

[DOI]

Torsten Hoefler

,

Greg Bronevetsky

,

,

Bronis R. de Supinski

,

Andrew Lumsdaine

Proceedings of the Recent Advances in the Message Passing Interface, 2010

Scalable communication protocols for dynamic sparse data exchange.

[DOI]

Torsten Hoefler

,

Christian Siebert

,

Andrew Lumsdaine

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

LogGOPSim: simulating large-scale applications in the LogGOPS model.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

The PERCS High-Performance Interconnect.

[DOI]

L. Baba Arimilli

,

,

,

,

Wolfgang E. Denzel

,

,

Torsten Hoefler

,

,

,

,

,

Ramakrishnan Rajamony

Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010

A space-efficient parallel algorithm for computing betweenness centrality in distributed memory.

[DOI]

,

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the 2010 International Conference on High Performance Computing, 2010

Bridging Performance Analysis Tools and Analytic Performance Modeling for HPC.

[DOI]

Torsten Hoefler

Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

AM++: a generalized active message framework.

[DOI]

Jeremiah Willcock

,

Torsten Hoefler

,

Nicholas Gerard Edmonds

,

Andrew Lumsdaine

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

LogGP in theory and practice - An in-depth analysis of modern interconnection networks and benchmarking methods for collective operations.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Simul. Model. Pract. Theory, 2009

The Effect of Network Noise on Large-Scale Collective Communications.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Parallel Process. Lett., 2009

Towards Efficient MapReduce Using MPI.

[DOI]

Torsten Hoefler

,

Andrew Lumsdaine

,

Jack J. Dongarra

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

Implementation and analysis of nonblocking collective operations on SCI networks.

[DOI]

Christian Kaiser

,

Torsten Hoefler

,

,

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Sparse collective operations for MPI.

[DOI]

Torsten Hoefler

,

Jesper Larsson Träff

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

The impact of network noise at large-scale communication performance.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

A power-aware, application-based performance study of modern commodity cluster interconnection networks.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Group Operation Assembly Language - A Flexible Way to Express Collective Communication.

[DOI]

Torsten Hoefler

,

Christian Siebert

,

Andrew Lumsdaine

Proceedings of the ICPP 2009, 2009

Optimized Routing for Large-Scale InfiniBand Networks.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

Demand-driven execution of static directed acyclic graphs using task parallelism.

[DOI]

Prabhanjan Kambadur

,

,

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the 16th International Conference on High Performance Computing, 2009

2008

Leveraging non-blocking collective communication in high-performance applications.

[DOI]

Torsten Hoefler

,

Peter Gottschling

,

Andrew Lumsdaine

Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Communication Optimization for Medical Image Reconstruction Algorithms.

[DOI]

Torsten Hoefler

,

Maraike Schellmann

,

Sergei Gorlatch

,

Andrew Lumsdaine

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Sparse Non-blocking Collectives in Quantum Mechanical Calculations.

[DOI]

Torsten Hoefler

,

Florian Lorenzen

,

Andrew Lumsdaine

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Accurately measuring collective operations at massive scale.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Optimizing non-blocking collective operations for infiniband.

[DOI]

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Adaptive Routing Strategies for Modern High Performance Networks.

[DOI]

Patrick Geoffray

,

Torsten Hoefler

Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

Multistage switches are not crossbars: Effects of static routing in high-performance networks.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

Message progression in parallel computing - to thread or not to thread?

[DOI]

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

Overlapping Communication and Computation with High Level Communication Routines.

[DOI]

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

An Optimized ZGEMM Implementation for the Cell BE.

[DOI]

,

Torsten Hoefler

,

Simon Wunderlich

,

,

Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA) held at the 21st Conference on the Architecture of Computing Systems (ARCS), 2008

2007

Implementation and performance analysis of non-blocking collective operations for MPI.

[DOI]

Torsten Hoefler

,

Andrew Lumsdaine

,

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

A Case for Standard Non-blocking Collective Operations.

[DOI]

Torsten Hoefler

,

Prabhanjan Kambadur

,

Richard L. Graham

,

Galen M. Shipman

,

Andrew Lumsdaine

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast.

[DOI]

Torsten Hoefler

,

Christian Siebert

,

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks.

[DOI]

Torsten Hoefler

,

,

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Netgauge: A Network Performance Measurement Framework.

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

,

Proceedings of the High Performance Computing and Communications, 2007

2006

Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations.

[DOI]

Torsten Hoefler

,

Peter Gottschling

,

,

Andrew Lumsdaine

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

IRS - A Portable Interface for Reconfigurable Systems.

[DOI]

,

,

Torsten Hoefler

,

,

Proceedings of the Fifth International Conference on Parallel Computing in Electrical Engineering (PARELEC 2006), 2006

Assessing Single-Message and Multi-Node Communication Performance of InfiniBand.

[DOI]

Torsten Hoefler

,

Carsten Viertel

,

,

,

Proceedings of the Fifth International Conference on Parallel Computing in Electrical Engineering (PARELEC 2006), 2006

A Case for Non-blocking Collective Operations.

[DOI]

Torsten Hoefler

,

Jeffrey M. Squyres

,

,

Andrew Lumsdaine

Proceedings of the Frontiers of High Performance Computing and Networking, 2006

LogfP - a model for small messages in InfiniBand.

[DOI]

Torsten Hoefler

,

,

,

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Fast barrier synchronization for InfiniBand™.

[DOI]

Torsten Hoefler

,

,

,

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Analysis of the Memory Registration Process in the Mellanox InfiniBand Software Stack.

[DOI]

,

,

Robert Baumgartl

,

,

Torsten Hoefler

,

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Adding Low-Cost Hardware Barrier Support to Small Commodity Clusters.

[DOI]

Torsten Hoefler

,

,

,

Proceedings of the ARCS 2006, 2006

2005

A Practical Approach to the Rating of Barrier Algorithms Using the LogP Model and Open MPI.

[DOI]

Torsten Hoefler

,

Lavinio Cerquetti

,

,

,

Proceedings of the 34th International Conference on Parallel Processing Workshops (ICPP 2005 Workshops), 2005

Loading...