Tal Ben-Nun

Jose Manuel Monsalve Diaz

Jacob Hegna

William S. Moses

Mircea Trofin

Johannes Doerfert

J. Data-centric Mach. Learn. Res., 2024

Lion Cub: Minimizing Communication Overhead in Distributed Lion.

[BibT_eX]

[DOI]

CoRR, 2024

Autonomous Execution for Multi-GPU Systems: Compiler Support.

[BibT_eX]

[DOI]

Javid Baydamirli

Didem Unat

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication.

[BibT_eX]

[DOI]

Lukas Gianinazzi

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Low-Depth Spatial Tree Algorithms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023

Arrow Matrix Decompositions.

[BibT_eX]

[DOI]

Lukas Gianinazzi

Dataset, April, 2023

Performance on HPC Platforms Is Possible Without C++.

[BibT_eX]

[DOI]

Anshu Dubey

Bradford L. Chamberlain

Bronis R. de Supinski

Damian W. I. Rouson

Comput. Sci. Eng., 2023

Cached Operator Reordering: A Unified View for Fast GNN Training.

[BibT_eX]

[DOI]

CoRR, 2023

STen: Productive and Efficient Sparsity in PyTorch.

[BibT_eX]

[DOI]

CoRR, 2023

Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization.

[BibT_eX]

[DOI]

CoRR, 2023

A Theory of I/O-Efficient Sparse Neural Network Inference.

[BibT_eX]

[DOI]

Niels Gleinig

CoRR, 2023

FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

VENOM: A Vectorized N: M Format for Unleashing the Power of Sparse Tensor Cores.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Supercomputing, 2023

Maximum Flows in Parametric Graph Templates.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Complexity - 13th International Conference, 2023

Bridging Control-Centric and Data-Centric Optimization.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

2022

Python FPGA Programming with Data-Centric Multi-Level Design.

[BibT_eX]

[DOI]

Carl-Johannes Johnsen

CoRR, 2022

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecast.

[BibT_eX]

[DOI]

CoRR, 2022

Deinsum: Practically I/O Optimal Multilinear Algebra.

[BibT_eX]

[DOI]

CoRR, 2022

The spatial computer: A model for energy-efficient parallel computation.

[BibT_eX]

[DOI]

CoRR, 2022

Deinsum: Practically I/O Optimal Multi-Linear Algebra.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Boosting Performance Optimization with Interactive Data Movement Visualization.

[BibT_eX]

[DOI]

Philipp Schaad

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Productive Performance Engineering for Weather and Climate Modeling with Python.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A data-centric optimization framework for machine learning.

[BibT_eX]

[DOI]

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Lifting C semantics for dataflow optimization.

[BibT_eX]

[DOI]

Alexandru Calotoiu

Grzegorz Kwasniewski

Philipp Schaad

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping.

[BibT_eX]

[DOI]

Carl-Johannes Johnsen

Tiziano De Matteis

Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

2021

Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging.

[BibT_eX]

[DOI]

Shigang Li

Giorgi Nadiradze

Salvatore Di Girolamo

Nikoli Dryden

Dan Alistarh

IEEE Trans. Parallel Distributed Syst., 2021

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2021

Learning Combinatorial Node Labeling Algorithms.

[BibT_eX]

[DOI]

CoRR, 2021

Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs.

[BibT_eX]

[DOI]

Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

Productivity, portability, performance: data-centric Python.

[BibT_eX]

[DOI]

Luca Lavarini

Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations.

[BibT_eX]

[DOI]

Grzegorz Kwasniewski

Marko Kabic

Proceedings of the International Conference for High Performance Computing, 2021

Clairvoyant prefetching for distributed machine learning I/O.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization.

[BibT_eX]

[DOI]

Grzegorz Kwasniewski

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Data Movement Is All You Need: A Case Study on Optimizing Transformers.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

NPBench: a benchmarking suite for high-performance NumPy.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations.

[BibT_eX]

[DOI]

Michael F. P. O'Boyle

Hugh Leather

Proceedings of the 38th International Conference on Machine Learning, 2021

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

Substream-Centric Maximum Matchings on FPGA.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2020

Groute: Asynchronous Multi-GPU Programming Model with Applications to Large-scale Graph Processing.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2020

Deep Data Flow Analysis.

[BibT_eX]

[DOI]

Michael F. P. O'Boyle

CoRR, 2020

Parametric Graph Templates: Properties and Algorithms.

[BibT_eX]

[DOI]

CoRR, 2020

Deep Learning for Post-Processing Ensemble Weather Forecasts.

[BibT_eX]

[DOI]

CoRR, 2020

Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.

[BibT_eX]

[DOI]

Shigang Li

Dan Alistarh

Salvatore Di Girolamo

Nikoli Dryden

CoRR, 2020

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis.

[BibT_eX]

[DOI]

CoRR, 2020

Workflows are the New Applications: Challenges in Performance, Portability, and Productivity.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Performance, 2020

Taming unbalanced training workloads in deep learning with partial collective operations.

[BibT_eX]

[DOI]

Shigang Li

Salvatore Di Girolamo

Dan Alistarh

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Augment Your Batch: Improving Generalization Through Instance Repetition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis.

[BibT_eX]

[DOI]

ACM Comput. Surv., 2019

A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations.

[BibT_eX]

[DOI]

Guillermo Indalecio Fernández

Mathieu Luisier

CoRR, 2019

Predicting Weather Uncertainty with Deep Convnets.

[BibT_eX]

[DOI]

CoRR, 2019

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency.

[BibT_eX]

[DOI]

CoRR, 2019

Graph Processing on FPGAs: Taxonomy, Survey, Challenges.

[BibT_eX]

[DOI]

Dimitri Stanojevic

CoRR, 2019

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs.

[BibT_eX]

[DOI]

CoRR, 2019

Augment your batch: better training with larger batches.

[BibT_eX]

[DOI]

CoRR, 2019

Optimizing the data movement in quantum transport simulations via data-centric parallel programming.

[BibT_eX]

[DOI]

Guillermo Indalecio Fernández

Mathieu Luisier

Proceedings of the International Conference for High Performance Computing, 2019

A data-centric approach to extreme-scale <i>ab initio</i> dissipative quantum transport simulations.

[BibT_eX]

[DOI]

Guillermo Indalecio Fernández

Mathieu Luisier

Proceedings of the International Conference for High Performance Computing, 2019

Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning.

[BibT_eX]

[DOI]

Simon Huber

Daniel Peter

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Substream-Centric Maximum Matchings on FPGA.

[BibT_eX]

[DOI]

Marc Fischer

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

2018

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching.

[BibT_eX]

[DOI]

CoRR, 2018

Neural Code Comprehension: A Learnable Representation of Code Semantics.

[BibT_eX]

[DOI]

Alice Shoshana Jakobovits

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Optimizing Parallel Graph Connectivity Computation via Subgraph Sampling.

[BibT_eX]

[DOI]

Michael Sutton

Amnon Barak

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Accelerating Deep Learning Frameworks with Micro-Batches.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017

Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Big data causing big (TLB) problems: taming random memory accesses on the GPU.

[BibT_eX]

[DOI]

Proceedings of the 13th International Workshop on Data Management on New Hardware, 2017

2016

FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing.

[BibT_eX]

[DOI]

Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Memory-Oriented Programming : A Data-Centric Programming Model for Systems with Multiple Parallel Accelerators (שער נוסף בעברית: תכנות מונחה זיכרון : מודל תכנות עבור מערכות מרובות מאיצים מקביליים.).

[BibT_eX]

[DOI]

PhD thesis, 2016

Spline-based parallel nonlinear optimization of function sequences.

[BibT_eX]

[DOI]