João P. L. de Carvalho

Orcid: 0000-0002-3476-184X

According to our database1, João P. L. de Carvalho authored at least 26 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Region-Based Data Layout via Data Reuse Analysis.
Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, 2024

2023
Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions.
ACM Trans. Archit. Code Optim., December, 2023

Fast matrix multiplication via compiler-only layered data reorganization and intrinsic lowering.
Softw. Pract. Exp., September, 2023

YaConv: Convolution with Low Cache Footprint.
ACM Trans. Archit. Code Optim., March, 2023

On the impact of mode transition on phased transactional memory performance.
J. Parallel Distributed Comput., March, 2023

DASS: Dynamic Adaptive Sub-Target Specialization.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops , 2023

To Pack or Not to Pack: A Generalized Packing Analysis and Transformation.
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

Efficient Auto-Vectorization for Control-flow Dependent Loops through Data Permutation.
Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering, 2023

Stub Folding: Retaining Type Specialization to Increase the Efficiency of Highly Polymorphic Inline Caches.
Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering, 2023

2022
Vectorizing divergent control flow with active-lane consolidation on long-vector architectures.
J. Supercomput., 2022

Using Barrier Elision to Improve Transactional Code Generation.
ACM Trans. Archit. Code Optim., 2022

Compiling for the IBM Matrix Engine for Enterprise Workloads.
IEEE Micro, 2022

Improving Convolution via Cache Hierarchy Tiling and Reduced Packing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls.
ACM Trans. Archit. Code Optim., 2021

Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Accelerating Graph Applications Using Phased Transactional Memory.
Proceedings of the Euro-Par 2021: Parallel Processing, 2021

2020
An efficient parallel implementation for training supervised optimum-path forest classifiers.
Neurocomputing, 2020

Acceleration Opportunities in Linear Algebra Applications via Idiom Recognition.
Proceedings of the Companion of the 2020 ACM/SPEC International Conference on Performance Engineering, 2020

Using OpenMP to Detect and Speculate Dynamic DOALL Loops.
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

Improving Transactional Code Generation via Variable Annotation and Barrier Elision.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

NV-PhTM: An Efficient Phase-Based Transactional System for Non-volatile Memory.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

2019
The Case for Phase-Based Transactional Memory.
IEEE Trans. Parallel Distributed Syst., 2019

2018
On the Efficiency of Transactional Code Generation: A GCC Case Study.
Proceedings of the Symposium on High Performance Computing Systems, 2018

DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

2017
Revisiting phased transactional memory.
Proceedings of the International Conference on Supercomputing, 2017

2012
Energy-Performance Tradeoffs in Software Transactional Memory.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012


  Loading...