Artur Podobas

Kentaro Sano

IEEE Access, 2020

Automatic Particle Trajectory Classification in Plasma Simulations.

[BibT_eX]

[DOI]

Stefano Markidis

Ivy Bo Peng

Itthinat Jongsuebchoke

Gabriel Bengtsson

Pawel Andrzej Herman

Proceedings of the 6th IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2020

sputniPIC: An Implicit Particle-in-Cell Code for Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

OpenMP Device Offloading to FPGAs Using the Nymble Infrastructure.

[BibT_eX]

[DOI]

Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

Extending High-Level Synthesis with High-Performance Computing Performance Visualization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2020

tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2020

A Template-based Framework for Exploring Coarse-Grained Reconfigurable Architectures.

[BibT_eX]

[DOI]

Kentaro Sano

Proceedings of the 31st IEEE International Conference on Application-specific Systems, 2020

2019

Learning Neural Representations for Predicting GPU Performance.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 34th International Conference, 2019

Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches?

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Scaling Performance for N-Body Stream Computation with a Ring of FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2019

2018

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL.

[BibT_eX]

[DOI]

Hamid Reza Zohouri

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Hardware Implementation of POSITs and Their Application in FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL.

[BibT_eX]

[DOI]

Hamid Reza Zohouri

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

Predicting Performance Using Collaborative Filtering.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017

Designing and accelerating spiking neural networks using OpenCL for FPGAs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Field Programmable Technology, 2017

Evaluating high-level design strategies on FPGAs for high-performance computing.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

2016

Empowering OpenMP with automatically generated hardware.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Grain graphs: OpenMP performance analysis made easy.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Towards Unifying OpenMP Under the Task-Parallel Paradigm - Implementation and Performance of the taskloop Construct.

[BibT_eX]

[DOI]

Sven Karlsson

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

2015

Improving Performance and Quality-of-Service through the Task-Parallel Model : Optimizations and Future Directions for OpenMP.

[BibT_eX]

[DOI]

PhD thesis, 2015

A comparative performance study of common and popular task-centric programming frameworks.

[BibT_eX]

[DOI]

Karl-Filip Faxén

Concurr. Comput. Pract. Exp., 2015

Using Transactional Memory to Avoid Blocking in OpenMP Synchronization Directives - Don't Wait, Speculate!

[BibT_eX]

[DOI]

Lars F. Bonnichsen

Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

2014

Accelerating Parallel Computations with OpenMP-Driven System-on-Chip Generation for FPGAs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, 2014

TurboBŁYSK: Scheduling for Improved Data-Driven Task Performance with Fast Dependency Resolution.

[BibT_eX]

[DOI]

Vladimir Vlassov

Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

2012

Exploring Heterogeneous Scheduling Using the Task-Centric Programming Model.

[BibT_eX]

[DOI]