Dominik Grewe

Norman Alexander Rink

CoRR, August, 2025

PartIR: Composing SPMD Partitioning Strategies for Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024

PartIR: Composing SPMD Partitioning Strategies for Machine Learning.

[BibT_eX]

[DOI]

CoRR, 2024

2022

Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR.

[BibT_eX]

[DOI]

CoRR, 2022

Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

2021

Automap: Towards Ergonomic Automated Parallelism for ML Models.

[BibT_eX]

[DOI]

Norman Alexander Rink

Vinod Nair

Dan Belov

CoRR, 2021

2019

TF-Replicator: Distributed Machine Learning for Researchers.

[BibT_eX]

[DOI]

Sergio Gomez Colmenarejo

Aedan Pope

Fabio Viola

Dan Belov

CoRR, 2019

2018

Parallel WaveNet: Fast High-Fidelity Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

2016

Mastering the game of Go with deep neural networks and tree search.

[BibT_eX]

[DOI]

Nat., 2016

2014

Mapping parallel programs to heterogeneous multi-core systems.

[BibT_eX]

[DOI]

PhD thesis, 2014

Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2014

NOVA: A Functional Language for Data Parallelism.

[BibT_eX]

[DOI]

Proceedings of the ARRAY'14: Proceedings of the 2014 ACM SIGPLAN International Workshop on Libraries, 2014

2013

OpenCL Task Partitioning in the Presence of GPU Contention.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2013

Portable mapping of data parallel programs to OpenCL for heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Prius: a runtime for hybrid computing.

[BibT_eX]

[DOI]

Proceedings of the First International Workshop on Code Optimisation for Multi and Many Cores, 2013

Input-aware auto-tuning for directive-based GPU programming.

[BibT_eX]

[DOI]

Alberto Magni

Nick Johnson

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

2011

A workload-aware mapping approach for data-parallel programs.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2011

A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction - 20th International Conference, 2011

Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation.

[BibT_eX]

[DOI]