João P. L. de Carvalho

Rafael Cardoso Fernandes Sousa

Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, 2024

2023

Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions.

[BibT_eX]

[DOI]

Victor Ferrari

Márcio Machado Pereira

José E. Moreira

ACM Trans. Archit. Code Optim., December, 2023

Fast matrix multiplication via compiler-only layered data reorganization and intrinsic lowering.

[BibT_eX]

[DOI]

Braedy Kuzma

Ivan Korostelev

Softw. Pract. Exp., September, 2023

YaConv: Convolution with Low Cache Footprint.

[BibT_eX]

[DOI]

Ivan Korostelev

José E. Moreira

ACM Trans. Archit. Code Optim., March, 2023

On the impact of mode transition on phased transactional memory performance.

[BibT_eX]

[DOI]

Catalina Munoz Morales

Bruno C. Honorio

J. Parallel Distributed Comput., March, 2023

DASS: Dynamic Adaptive Sub-Target Specialization.

[BibT_eX]

[DOI]

Tyler Gobran

Christopher Barton

Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops , 2023

To Pack or Not to Pack: A Generalized Packing Analysis and Transformation.

[BibT_eX]

[DOI]

Caio Salvador Rohwedder

Nathan Henderson

Yufei Chen

Rouzbeh Paktinatkeleshteri

Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

Efficient Auto-Vectorization for Control-flow Dependent Loops through Data Permutation.

[BibT_eX]

[DOI]

Ehsan Amiri

Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering, 2023

Stub Folding: Retaining Type Specialization to Increase the Efficiency of Highly Polymorphic Inline Caches.

[BibT_eX]

[DOI]

Nathan Henderson

Iain Ireland

Matthew Gaudet

Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering, 2023

2022

Vectorizing divergent control flow with active-lane consolidation on long-vector architectures.

[BibT_eX]

[DOI]

Wyatt Praharenka

David Pankratz

Ehsan Amiri

J. Supercomput., 2022

Using Barrier Elision to Improve Transactional Code Generation.

[BibT_eX]

[DOI]

Catalina Munoz Morales

ACM Trans. Archit. Code Optim., 2022

Compiling for the IBM Matrix Engine for Enterprise Workloads.

[BibT_eX]

[DOI]

José E. Moreira

IEEE Micro, 2022

Improving Convolution via Cache Hierarchy Tiling and Reduced Packing.

[BibT_eX]

[DOI]

Victor Ferrari

Rafael C. F. Sousa

Márcio Machado Pereira

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2021

Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions.

[BibT_eX]

[DOI]

Caio S. Rohwedder

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Accelerating Graph Applications Using Phased Transactional Memory.

[BibT_eX]

[DOI]

Catalina Munoz Morales

Rafael Murari

Proceedings of the Euro-Par 2021: Parallel Processing, 2021

2020

An efficient parallel implementation for training supervised optimum-path forest classifiers.

[BibT_eX]

[DOI]

Aldo Culquicondor

César Castelo-Fernández

João Paulo Papa

Neurocomputing, 2020

Acceleration Opportunities in Linear Algebra Applications via Idiom Recognition.

[BibT_eX]

[DOI]

Braedy Kuzma

Proceedings of the Companion of the 2020 ACM/SPEC International Conference on Performance Engineering, 2020

Using OpenMP to Detect and Speculate Dynamic DOALL Loops.

[BibT_eX]

[DOI]

Munir S. Skaf

Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

Improving Transactional Code Generation via Variable Annotation and Barrier Elision.

[BibT_eX]

[DOI]

Bruno C. Honorio

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

NV-PhTM: An Efficient Phase-Based Transactional System for Non-volatile Memory.

[BibT_eX]

[DOI]

Rafael Murari

Proceedings of the Euro-Par 2020: Parallel Processing, 2020

2019

The Case for Phase-Based Transactional Memory.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

2018

On the Efficiency of Transactional Code Generation: A GCC Case Study.

[BibT_eX]

[DOI]

Alexandro José Baldassin

Proceedings of the Symposium on High Performance Computing Systems, 2018

DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability.

[BibT_eX]

[DOI]

Luis Mattos

Divino Cesar S. Lucas

Juan Salamanca

Márcio Machado Pereira

Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

2017

Revisiting phased transactional memory.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2017

2012

Energy-Performance Tradeoffs in Software Transactional Memory.

[BibT_eX]

[DOI]