Maciej Besta

CoRR, April, 2025

Affordable AI Assistants with Knowledge Graph of Thoughts.

[BibT_eX]

[DOI]

CoRR, April, 2025

PlaceIT: Placement-based Inter-Chiplet Interconnect Topologies.

[BibT_eX]

[DOI]

CoRR, February, 2025

Reasoning Language Models: A Blueprint.

[BibT_eX]

[DOI]

Afonso Claudino Catarino

CoRR, January, 2025

Energy-Optimal and Low-Depth Algorithmic Primitives for Spatial Dataflow Architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

RapidChiplet: A Toolchain for Rapid Design Space Exploration of Inter-Chiplet Interconnects.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM International Conference on Computing Frontiers, 2025

2024

Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries.

[BibT_eX]

[DOI]

ACM Comput. Surv., February, 2024

Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments.

[BibT_eX]

[DOI]

CoRR, 2024

Demystifying Higher-Order Graph Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2024

Multi-Head RAG: Solving Multi-Aspect Problems with LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts.

[BibT_eX]

[DOI]

CoRR, 2024

PolarStar: Expanding the Horizon of Diameter-3 Networks.

[BibT_eX]

[DOI]

Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures, 2024

High Performance Unstructured SpMM Computation Using Tensor Cores.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication.

[BibT_eX]

[DOI]

Lukas Gianinazzi

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network.

[BibT_eX]

[DOI]

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Low-Depth Spatial Tree Algorithms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Graph of Thoughts: Solving Elaborate Problems with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., June, 2023

GNN Scaling 0.1 Software Artifact.

[BibT_eX]

[DOI]

Dataset, June, 2023

GDI-RMA 0.1 Software Artifact.

[BibT_eX]

[DOI]

Dataset, June, 2023

Arrow Matrix Decompositions.

[BibT_eX]

[DOI]

Lukas Gianinazzi

Dataset, April, 2023

RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures.

[BibT_eX]

[DOI]

CoRR, 2023

Cached Operator Reordering: A Unified View for Fast GNN Training.

[BibT_eX]

[DOI]

CoRR, 2023

High-Performance Graph Databases That Are Portable, Programmable, and Scale to Hundreds of Thousands of Cores.

[BibT_eX]

[DOI]

CoRR, 2023

PolarStar: Expanding the Scalability Horizon of Diameter-3 Networks.

[BibT_eX]

[DOI]

CoRR, 2023

In-network Allreduce with Multiple Spanning Trees on PolarFly.

[BibT_eX]

[DOI]

Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023

High-Performance and Programmable Attentional Graph Neural Networks with Global Tensor Formulations.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

HOT: Higher-Order Dynamic Graph Representation Learning With Efficient Transformers.

[BibT_eX]

[DOI]

Afonso Claudino Catarino

Proceedings of the Learning on Graphs Conference, 27-30 November 2023, Virtual Event., 2023

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement.

[BibT_eX]

[DOI]

Patrick Iff

Matheus A. Cavalcante

Tim Fischer

Luca Benini

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Sparse Hamming Graph: A Customizable Network-on-Chip Topology.

[BibT_eX]

[DOI]

Patrick Iff

Matheus A. Cavalcante

Tim Fischer

Luca Benini

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022

PolarFly: A Cost-Effective and Flexible Low-Diameter Topology.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Building Blocks for Network-Accelerated Distributed File Systems.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Neural Graph Databases.

[BibT_eX]

[DOI]

Proceedings of the Learning on Graphs Conference, 2022

Motif Prediction with Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching.

[BibT_eX]

[DOI]

András Strausz

Flavio Vella

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix Multiplication.

[BibT_eX]

[DOI]

Niels Gleinig

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021

Enabling High-Performance Large-Scale Irregular Computations.

[BibT_eX]

[DOI]

PhD thesis, 2021

Transformations of High-Level Synthesis Codes for High-Performance Computing.

[BibT_eX]

[DOI]

Simon Meierhans

IEEE Trans. Parallel Distributed Syst., 2021

High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks.

[BibT_eX]

[DOI]

Timo Schneider

IEEE Trans. Parallel Distributed Syst., 2021

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2021

Learning Combinatorial Node Labeling Algorithms.

[BibT_eX]

[DOI]

CoRR, 2021

Towards Million-Server Network Simulations on Just a Laptop.

[BibT_eX]

[DOI]

Marcel Schneider

CoRR, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.

[BibT_eX]

[DOI]

Konstantinos Kanellopoulos

Rachata Ausavarungnirun

Jakub Beránek

Kacper Janda

Marek Konieczny

Onur Mutlu

CoRR, 2021

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.

[BibT_eX]

[DOI]

CoRR, 2021

The future is big graphs: a community view on graph processing systems.

[BibT_eX]

[DOI]

Commun. ACM, 2021

Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs.

[BibT_eX]

[DOI]

Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

Parallel Algorithms for Finding Large Cliques in Sparse Graphs.

[BibT_eX]

[DOI]

Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations.

[BibT_eX]

[DOI]

Marko Kabic

Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization.

[BibT_eX]

[DOI]

Timo Schneider

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

SeBS: a serverless benchmark suite for function-as-a-service computing.

[BibT_eX]

[DOI]

Proceedings of the Middleware '21: 22nd International Middleware Conference, Québec City, Canada, December 6, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.

[BibT_eX]

[DOI]

Konstantinos Kanellopoulos

Rachata Ausavarungnirun

Jakub Beránek

Kacper Janda

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

2020

Substream-Centric Maximum Matchings on FPGA.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2020

High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers.

[BibT_eX]

[DOI]

Timo Schneider

CoRR, 2020

FatPaths: routing in supercomputers and data centers when shortest paths fall short.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

High-performance parallel graph coloring with strong guarantees on work, depth, and quality.

[BibT_eX]

[DOI]

Armon Carigiet

Kacper Janda

Lukas Gianinazzi

Proceedings of the International Conference for High Performance Computing, 2020

Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019

Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism.

[BibT_eX]

[DOI]

CoRR, 2019

FatPaths: Routing in Supercomputers, Data Centers, and Clouds with Low-Diameter Networks when Shortest Paths Fall Short.

[BibT_eX]

[DOI]

CoRR, 2019

Graph Processing on FPGAs: Taxonomy, Survey, Challenges.

[BibT_eX]

[DOI]

Dimitri Stanojevic

CoRR, 2019

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

Network-accelerated non-contiguous memory transfers.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

Slim graph: practical lossy graph compression for approximate graph processing, storage, and analytics.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning.

[BibT_eX]

[DOI]

Simon Huber

Daniel Peter

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Substream-Centric Maximum Matchings on FPGA.

[BibT_eX]

[DOI]

Marc Fischer

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

2018

Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations.

[BibT_eX]

[DOI]

CoRR, 2018

Communication-avoiding parallel minimum cuts and connected components.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability.

[BibT_eX]

[DOI]

Syed Minhaj Hassan

Sudhakar Yalamanchili

Rachata Ausavarungnirun

Onur Mutlu

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Log(graph): a near-optimal high-performance graph representation.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Scaling betweenness centrality using communication-efficient sparse matrix multiplication.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

SlimSell: A Vectorizable Graph Representation for Breadth-First Search.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations.

[BibT_eX]

[DOI]

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

2016

Betweenness Centrality is more Parallelizable than Dense Matrix Multiplication.

[BibT_eX]

[DOI]

CoRR, 2016

High-Performance Distributed RMA Locks.

[BibT_eX]

[DOI]

Patrick Schmid

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

2015

Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages.

[BibT_eX]

[DOI]

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

Evaluating the Cost of Atomic Operations on Modern Architectures.

[BibT_eX]

[DOI]

Hermann Schweizer

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Slim Fly: A Cost Effective Low-Diameter Network Topology.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

Fault tolerance for remote memory access programming models.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

2013

Enabling highly-scalable remote memory access programming with MPI-3 one sided.

[BibT_eX]

[DOI]

Robert Gerstenberger