Jesper Larsson Träff

CoRR, May, 2026

Two Efficient Message-passing Exclusive Scan Algorithms.

[BibT_eX]

[DOI]

CoRR, April, 2026

Lectures on Parallel Computing

[BibT_eX]

[DOI]

Lecture Notes in Computer Science 14600, Springer, ISBN: 978-3-031-86578-7, 2026

2025

Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, Reduction, All-Broadcast and All-Reduction.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., September, 2025

Communication Round and Computation Efficient Exclusive Prefix-Sums Algorithms (for MPI_Exscan).

[BibT_eX]

[DOI]

CoRR, July, 2025

Optimal, Non-pipelined Reduce-scatter and Allreduce Algorithms with an Application to All-to-all Communication.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2025

Mpisee: Communicator-Centric Profiling of MPI Applications.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2025

2024

Optimal, Non-pipelined Reduce-scatter and Allreduce Algorithms.

[BibT_eX]

[DOI]

CoRR, 2024

Lectures on Parallel Computing.

[BibT_eX]

[DOI]

CoRR, 2024

Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, All-Broadcast, Reduction and All-Reduction.

[BibT_eX]

[DOI]

CoRR, 2024

Modes, Persistence and Orthogonality: Blowing MPI Up.

[BibT_eX]

[DOI]

Ioannis Vardas

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Cluster, 2024

2023

Realizing multioperations and multiprefixes in Thick Control Flow processors.

[BibT_eX]

[DOI]

Microprocess. Microsystems, April, 2023

Round-optimal n-Block Broadcast Schedules in Logarithmic Time.

[BibT_eX]

[DOI]

CoRR, 2023

Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures.

[BibT_eX]

[DOI]

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes.

[BibT_eX]

[DOI]

Ioannis Vardas

Proceedings of the 30th European MPI Users' Group Meeting, 2023

Preliminary Performance and Memory Access Scalability Study of Thick Control Flow Processors.

[BibT_eX]

[DOI]

Proceedings of the IEEE Nordic Circuits and Systems Conference, 2023

Exploring Mapping Strategies for Co-allocated HPC Applications.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28, 2023

Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022

Performance and programmability comparison of the thick control flow architecture and current multicore processors.

[BibT_eX]

[DOI]

J. Supercomput., 2022

(Poly)Logarithmic Time Construction of Round-optimal n-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI.

[BibT_eX]

[DOI]

CoRR, 2022

Brief Announcement: Fast(er) Construction of Round-optimal n-Block Broadcast Schedules.

[BibT_eX]

[DOI]

Proceedings of the SPAA '22: 34th ACM Symposium on Parallelism in Algorithms and Architectures, Philadelphia, PA, USA, July 11, 2022

mpisee: MPI Profiling for Communication and Communicator Structure.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

An Overhead Analysis of MPI Profiling and Tracing Tools.

[BibT_eX]

[DOI]

Proceedings of the PERMAVOST@HPDC 2022: Proceedings of the 2nd Workshop on Performance EngineeRing, 2022

Fast(er) Construction of Round-optimal $n$-Block Broadcast Schedules.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021

MPI collective communication through a single set of interfaces: A case for orthogonality.

[BibT_eX]

[DOI]

Parallel Comput., 2021

A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation.

[BibT_eX]

[DOI]

CoRR, 2021

A more pragmatic implementation of the lock-free, ordered, linked list.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

2020

Special issue: Selected papers from EuroMPI 2019.

[BibT_eX]

[DOI]

Torsten Hoefler

Parallel Comput., 2020

Better Process Mapping and Sparse Quadratic Assignment.

[BibT_eX]

[DOI]

Konrad von Kirchbach

ACM J. Exp. Algorithmics, 2020

k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms.

[BibT_eX]

[DOI]

Alexander van der Grinten

CoRR, 2020

High-Quality Hierarchical Process Mapping.

[BibT_eX]

[DOI]

Marcelo Fonseca Faraj

Henning Meyerhenke

Proceedings of the 18th International Symposium on Experimental Algorithms, 2020

Collectives and Communicators: A Case for Orthogonality: (Or: How to get rid of MPI neighbor and enhance Cartesian collectives).

[BibT_eX]

[DOI]

Proceedings of the EuroMPI/USA '20: 27th European MPI Users' Group Meeting, 2020

Signature Datatypes for Type Correct Collective Operations, Revisited.

[BibT_eX]

[DOI]

Proceedings of the EuroMPI/USA '20: 27th European MPI Users' Group Meeting, 2020

Optimizing Memory Access in TCF Processors with Compute-Update Operations.

[BibT_eX]

[DOI]

Martti Forsell

Jussi Roivainen

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Decomposing MPI Collectives for Exploiting Multi-lane Communication.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2020

Efficient Process-to-Node Mapping Algorithms for Stencil Computations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019

On Optimal Trees for Irregular Gather and Scatter Collectives.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

Scalable Algorithms for MPI Intergroup Allgather and Allgatherv.

[BibT_eX]

[DOI]

Parallel Comput., 2019

Decomposing Collectives for Exploiting Multi-lane Communication.

[BibT_eX]

[DOI]

CoRR, 2019

More Parallelism in Dijkstra's Single-Source Shortest Path Algorithm.

[BibT_eX]

[DOI]

Michael Kainer

CoRR, 2019

Foreword EuroMPI 2019.

[BibT_eX]

[DOI]

Torsten Hoefler

Proceedings of the 26th European MPI Users' Group Meeting, 2019

Cartesian Collective Communication.

[BibT_eX]

[DOI]

Proceedings of the 48th International Conference on Parallel Processing, 2019

How to Make the Preconditioned Conjugate Gradient Method Resilient Against Multiple Node Failures.

[BibT_eX]

[DOI]

Carlos Pachajoa

Markus Levonyak

Wilfried N. Gansterer

Proceedings of the 48th International Conference on Parallel Processing, 2019

2018

Practical, distributed, low overhead algorithms for irregular gather and scatter collectives.

[BibT_eX]

[DOI]

Parallel Comput., 2018

Supporting concurrent memory access in TCF processor architectures.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2018

Stamp-it: A more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model.

[BibT_eX]

[DOI]

CoRR, 2018

Parallel Quicksort without Pairwise Element Exchange.

[BibT_eX]

[DOI]

CoRR, 2018

Memory Models for C/C++ Programmers.

[BibT_eX]

[DOI]

CoRR, 2018

Brief Announcement: Stamp-it, a more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model.

[BibT_eX]

[DOI]

Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, 2018

Full-Duplex Inter-Group All-to-All Broadcast Algorithms with Optimal Bandwidth.

[BibT_eX]

[DOI]

Proceedings of the 25th European MPI Users' Group Meeting, 2018

<i>Stamp-it</i>, amortized constant-time memory reclamation in comparison to five other schemes.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Implementation of Multioperations in Thick Control Flow Processors.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

2017

A new and five older Concurrent Memory Reclamation Schemes in Comparison (Stamp-it).

[BibT_eX]

[DOI]

CoRR, 2017

VieM v1.00 - Vienna Mapping and Sparse Quadratic Assignment User Guide.

[BibT_eX]

[DOI]

CoRR, 2017

Better Process Mapping and Sparse Quadratic Assignment.

[BibT_eX]

[DOI]

Proceedings of the 16th International Symposium on Experimental Algorithms, 2017

Practical, linear-time, fully distributed algorithms for irregular gather and scatter.

[BibT_eX]

[DOI]

Proceedings of the 24th European MPI Users' Group Meeting, 2017

Supporting concurrent memory access in TCF-aware processor architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE Nordic Circuits and Systems Conference, 2017

Exploiting Common Neighborhoods to Optimize MPI Neighborhood Collectives.

[BibT_eX]

[DOI]

Seyed Hessam Mirsadeghi

Pavan Balaji

Ahmad Afsahi

Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

2016

Message-Combining Algorithms for Isomorphic, Sparse Collective Communication.

[BibT_eX]

[DOI]

CoRR, 2016

PGMPI: Automatically Verifying Self-Consistent MPI Performance Guidelines.

[BibT_eX]

[DOI]

Felix Donatus Lübbe

CoRR, 2016

Benchmarking Concurrent Priority Queues: Performance of k-LSM and Related Data Structures.

[BibT_eX]

[DOI]

Jakob Gruber

CoRR, 2016

MPI Derived Datatypes: Performance Expectations and Status Quo.

[BibT_eX]

[DOI]

CoRR, 2016

Special issue: Euro-Par 2015.

[BibT_eX]

[DOI]

Christian Lengauer

Luc Bougé

Concurr. Comput. Pract. Exp., 2016

(Mis)managing parallel computing research through EU project funding.

[BibT_eX]

[DOI]

Commun. ACM, 2016

The EPiGRAM Project: Preparing Parallel Programming Models for Exascale.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2016

Brief Announcement: Benchmarking Concurrent Priority Queues.

[BibT_eX]

[DOI]

Jakob Gruber

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016

A Library for Advanced Datatype Programming.

[BibT_eX]

[DOI]

Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

On the Expected and Observed Communication Performance with MPI Derived Datatypes.

[BibT_eX]

[DOI]

Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Polynomial-Time Construction of Optimal MPI Derived Datatype Trees.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Automatic Verification of Self-consistent MPI Performance Guidelines.

[BibT_eX]

[DOI]

Felix Donatus Lübbe

Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015

The Shortest Path Problem with Edge Information Reuse is NP-Complete.

[BibT_eX]

[DOI]

CoRR, 2015

Polynomial-time Construction of Optimal Tree-structured Communication Data Layout Descriptions.

[BibT_eX]

[DOI]

CoRR, 2015

Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations.

[BibT_eX]

[DOI]

Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Specification Guideline Violations by MPI_Dims_create.

[BibT_eX]

[DOI]

Felix Donatus Lübbe

Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Efficient, Optimal MPI Datatype Reconstruction for Vector and Index Types.

[BibT_eX]

[DOI]

Martin Kalany

Proceedings of the 22nd European MPI Users' Group Meeting, 2015

The lock-free k-LSM relaxed priority queue.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

2014

Perfectly Load-Balanced, Stable, Synchronization-Free Parallel Merge.

[BibT_eX]

[DOI]

Christian Siebert

Parallel Process. Lett., 2014

An improved, easily computable combinatorial lower bound for weighted graph bipartitioning.

[BibT_eX]

[DOI]

CoRR, 2014

Selected Papers from EuroMPI 2012 - 19th European MPI Users' Group Meeting.

[BibT_eX]

[DOI]

Siegfried Benkner

Computing, 2014

Zero-copy, Hierarchical Gather is not possible with MPI Datatypes and Collectives.

[BibT_eX]

[DOI]

Proceedings of the 21st European MPI Users' Group Meeting, 2014

MPI Collectives and Datatypes for Hierarchical All-to-all Communication.

[BibT_eX]

[DOI]

Proceedings of the 21st European MPI Users' Group Meeting, 2014

Optimal MPI Datatype Normalization for Vector and Index-block Types.

[BibT_eX]

[DOI]

Proceedings of the 21st European MPI Users' Group Meeting, 2014

Reproducible MPI Micro-Benchmarking Isn't As Easy As You Think.

[BibT_eX]

[DOI]

Proceedings of the 21st European MPI Users' Group Meeting, 2014

Data structures for task-based priority scheduling.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Implementing a classic: zero-copy all-to-all communication with mpi datatypes.

[BibT_eX]

[DOI]

Proceedings of the 2014 International Conference on Supercomputing, 2014

2013

Configurable Strategies for Work-stealing

[BibT_eX]

[DOI]

CoRR, 2013

A Note on (Parallel) Depth- and Breadth-First Search by Arc Elimination

[BibT_eX]

[DOI]

CoRR, 2013

Perfectly load-balanced, optimal, stable, parallel merge

[BibT_eX]

[DOI]

Christian Siebert

CoRR, 2013

On the State and Importance of Reproducible Experimental Research in Parallel Computing.

[BibT_eX]

[DOI]

CoRR, 2013

Work-stealing with configurable scheduling strategies.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

The Pheet Task-Scheduling Framework on the Intel® Xeon Phi Coprocessor and other Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

2012

Alternative, uniformly expressive and more scalable interfaces for collective communication in MPI.

[BibT_eX]

[DOI]

Parallel Comput., 2012

Top Picks from Hot Interconnects 2011: Petascale Network Architectures.

[BibT_eX]

[DOI]

IEEE Micro, 2012

Simplified, stable parallel merging

[BibT_eX]

[DOI]

CoRR, 2012

Poster: Leveraging PEPPHER Technology for Performance Portable Supercomputing.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Leveraging PEPPHER Technology for Performance Portable Supercomputing.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

mpicroscope: Towards an MPI Benchmark Tool for Performance Guideline Verification.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Efficient MPI Implementation of a Parallel, Stable Merge Algorithm.

[BibT_eX]

[DOI]

Christian Siebert

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Programmability and performance portability aspects of heterogeneous multi-/manycore systems.

[BibT_eX]

[DOI]

Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

2011

Broadcast.

[BibT_eX]

[DOI]