Márcio Machado Pereira

Lucas Alvarenga

Gustavo Leite

CoRR, November, 2025

Checkpointing fine-tuning for accelerating seismic applications in GPUs.

[BibT_eX]

[DOI]

Thiago Maltempi

Sandro Rigo

Int. J. High Perform. Comput. Appl., 2025

Scalable OpenMP Remote Offloading via Asynchronous MPI and Coroutine-Driven Communication.

[BibT_eX]

[DOI]

Jhonatan Cléto

Guilherme Valarini

Proceedings of the Euro-Par 2025: Parallel Processing, 2025

2024

ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation.

[BibT_eX]

[DOI]

Lucas Alvarenga

Rafael Souza

CoRR, 2024

DeepWave: A Software Stack for Parallelizing Deep Learning Models Used in Geophysics.

[BibT_eX]

[DOI]

Allan Pinto

Gustavo Leite

Sandro Rigo

Pedro Henrique Di Francia Rosso

Proceedings of the 36th IEEE International Symposium on Computer Architecture and High Performance Computing, 2024

Integrating Multi-FPGA Acceleration to OpenMP Distributed Computing.

[BibT_eX]

[DOI]

Lucian Petrica

Nusrat Jahan Lisa

Proceedings of the Advancing OpenMP for Future Accelerators, 2024

Combining Compression and Prefetching to Improve Checkpointing for Inverse Seismic Problems in GPUs.

[BibT_eX]

[DOI]

Thiago Maltempi

Sandro Rigo

Jessé Costa

Proceedings of the Euro-Par 2024: Parallel Processing, 2024

2023

Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions.

[BibT_eX]

[DOI]

Rafael Cardoso Fernandes Sousa

Joao P. L. de Carvalho

José E. Moreira

ACM Trans. Archit. Code Optim., December, 2023

Tensor slicing and optimization for multicore NPUs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., May, 2023

2022

Implementing the Broadcast Operation in a Distributed Task-based Runtime.

[BibT_eX]

[DOI]

Rodrigo Ceccato

Alan Souza

Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops, 2022

An OpenMP-only Linear Algebra Library for Distributed Architectures.

[BibT_eX]

[DOI]

Alan Souza

Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops, 2022

The OpenMP Cluster Programming Model.

[BibT_eX]

[DOI]

Pedro Henrique Di Francia Rosso

Emilio Francesquini

Guilherme Valarini

Gustavo Leite

Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022

Improving Convolution via Cache Hierarchy Tiling and Reduced Packing.

[BibT_eX]

[DOI]

João P. L. de Carvalho

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

Enabling OpenMP Task Parallelism on Multi-FPGAs.

[BibT_eX]

[DOI]

Ramon Nepomuceno

Renan Sterle

Guilherme Valarini

Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

2020

OmpTracing: Easy Profiling of OpenMP Programs.

[BibT_eX]

[DOI]

Vitoria Pinho

Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

2019

Data-flow analysis and optimization for data coherence in heterogeneous architectures.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2019

2018

DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability.

[BibT_eX]

[DOI]

Luis Mattos

Divino Cesar S. Lucas

Juan Salamanca

Joao P. L. de Carvalho

Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

Automatic Offloading of Cluster Accelerators.

[BibT_eX]

[DOI]

Ciro Ceissler

Ramon Nepomuceno

Gleison Souza Diniz Mendonca

Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

2017

DawnCC: Automatic Annotation for Data Parallelism and Offloading.

[BibT_eX]

[DOI]

Breno Campos Ferreira Guimarães

Péricles Alves

ACM Trans. Archit. Code Optim., 2017

Automatic Scan Parallelization in OpenMP.

[BibT_eX]

[DOI]

Maicol Zegarra

Xavier Martorell

Rafael Cardoso Fernandes Sousa

Proceedings of the 2017 International Symposium on Computer Architecture and High Performance Computing Workshops, 2017

Data Coherence Analysis and Optimization for Heterogeneous Computing.

[BibT_eX]

[DOI]

Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR.

[BibT_eX]

[DOI]

Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

2016

Study of hardware transactional memory characteristics and serialization policies on Haswell.

[BibT_eX]

[DOI]

Matthew Gaudet

Gleison Souza Diniz Mendonca

Parallel Comput., 2016

Automatic Insertion of Copy Annotation in Data-Parallel Programs.

[BibT_eX]

[DOI]

Breno Campos Ferreira Guimarães

Péricles Rafael Oliveira Alves

Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

2015

Técnicas de escalonamento e serialização para memórias transacionais.

[BibT_eX]

[DOI]

PhD thesis, 2015

2014

Multi-dimensional Evaluation of Haswell's Transactional Memory Performance.

[BibT_eX]

[DOI]

Matthew Gaudet

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Measuring Effective Work to Reward Success in Dynamic Transaction Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

2013

Transaction scheduling using conflict avoidance and Contention Intensity.

[BibT_eX]

[DOI]

Alexandro Baldassin

Luiz Eduardo Buzato

Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

1989

A Linguagem de Programação CHILL.

[BibT_eX]

[DOI]