Pedro Valero-Lara

Orcid: 0000-0002-1479-4310

According to our database1, Pedro Valero-Lara authored at least 62 papers between 2011 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
sKokkos: Enabling Kokkos with Transparent Device Selection on Heterogeneous Systems using OpenACC.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2024

2023
Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation.
CoRR, 2023

Moment Representation of Regularized Lattice Boltzmann Methods on NVIDIA and AMD GPUs.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor Cores.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

MatRIS: Multi-level Math Library Abstraction for Heterogeneity and Performance Portability using IRIS Runtime.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Julia as a unifying end-to-end workflow language on the Frontier exascale system.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Tiling Framework for Heterogeneous Computing of Matrix based Tiled Algorithms.
Proceedings of the 2nd International Workshop on Extreme Heterogeneity Solutions, 2023

A MultiGPU Performance-Portable Solution for Array Programming Based on Kokkos.
Proceedings of the 9th ACM SIGPLAN International Workshop on Libraries, 2023

(AsHES) 2023 Keynote Speaker Agnostic Programing: "Less is More".
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation.
Proceedings of the 52nd International Conference on Parallel Processing Workshops, 2023

IRIS-DMEM: Efficient Memory Management for Heterogeneous Computing.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2023

2022
Propagation Pattern for Moment Representation of the Lattice Boltzmann Method.
IEEE Trans. Parallel Distributed Syst., 2022

cuConv: CUDA implementation of convolution for CNN inference.
Clust. Comput., 2022

KokkACC: Enhancing Kokkos with OpenACC.
Proceedings of the 9th Workshop on Accelerator Programming Using Directives, 2022

LaRIS: Targeting Portability and Productivity for LAPACK Codes on Extreme Heterogeneous Systems by Using IRIS.
Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2022

SparseLU, A Novel Algorithm and Math Library for Sparse LU Factorization.
Proceedings of the 12th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2022

IRIS-BLAS: Towards a Performance Portable and Heterogeneous BLAS Library.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

A Portable and Heterogeneous LU Factorization on IRIS.
Proceedings of the Euro-Par 2022: Parallel Processing Workshops, 2022

2021
Static Graphs for Coding Productivity in OpenACC.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems.
Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

2020
sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library).
J. Parallel Distributed Comput., 2020

Towards an Auto-Tuned and Task-Based SpMV (LASs Library).
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

2019
MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain.
Parallel Comput., 2019

A Fast Solver for Large Tridiagonal Systems on Multi-Core Processors (Lass Library).
IEEE Access, 2019

Performance Evaluation of cuDNN Convolution Algorithms on NVIDIA Volta GPUs.
IEEE Access, 2019

BLAS-3 Optimized by OmpSs Regions (LASs Library).
Proceedings of the 27th Euromicro International Conference on Parallel, 2019

Tasking in Accelerators: Performance Evaluation.
Proceedings of the 20th International Conference on Parallel and Distributed Computing, 2019

Accelerating Conjugate Gradient using OmpSs.
Proceedings of the 20th International Conference on Parallel and Distributed Computing, 2019

2018
cuThomasBatch and cuThomasVBatch, CUDA Routines to compute batch of tridiagonal systems on NVIDIA GPUs.
Concurr. Comput. Pract. Exp., 2018

MPI+OpenMP Tasking Scalability for the Simulation of the Human Brain: Human Brain Project.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

Variable Batched DGEMM.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

2017
Introduction to the Special Issue on High Performance Computing Solutions for Complex Problems.
Scalable Comput. Pract. Exp., 2017

Towards HPC-Embedded. Case Study: Kalray and Message-Passing on NoC.
Scalable Comput. Pract. Exp., 2017

Heterogeneous CPU+GPU approaches for mesh refinement over Lattice-Boltzmann simulations.
Concurr. Comput. Pract. Exp., 2017

Reducing memory requirements for large size LBM simulations on GPUs.
Concurr. Comput. Pract. Exp., 2017

NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch.
Proceedings of the Parallel Processing and Applied Mathematics, 2017

Heuristics for ROSA's LTS Searching.
Proceedings of the Advances in Computational Intelligence, 2017

cuHinesBatch: Solving Multiple Hines systems on GPUs Human Brain Project<sup>*</sup>.
Proceedings of the International Conference on Computational Science, 2017

The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems.
Proceedings of the International Conference on Computational Science, 2017

2016
Introduction to the Special Issue on High Performance Computing Solutions for Complex Problems.
Scalable Comput. Pract. Exp., 2016

Many-Task Computing on Many-Core Architectures.
Scalable Comput. Pract. Exp., 2016

Leveraging the Performance of LBM-HPC for Large Sizes on GPUs Using Ghost Cells.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

2015
Accelerating fluid-solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures.
J. Comput. Sci., 2015

A Non-uniform Staggered Cartesian Grid Approach for Lattice-boltzmann Method.
Proceedings of the International Conference on Computational Science, 2015

Multi-domain Grid Refinement for Lattice-Boltzmann Simulations on Heterogeneous Platforms.
Proceedings of the 18th IEEE International Conference on Computational Science and Engineering, 2015

LBM-HPC - An Open-Source Tool for Fluid Simulations. Case Study: Unified Parallel C (UPC-PGAS).
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
Accelerating solid-fluid interaction based on the immersed boundary method on multicore and GPU architectures.
J. Supercomput., 2014

Fast finite difference Poisson solvers on heterogeneous architectures.
Comput. Phys. Commun., 2014

hLCS. A Hybrid GPGPU Approach for Solving Multiple Short and Unbalanced LCS Problems.
Proceedings of the Computational Science and Its Applications - ICCSA 2014 - 14th International Conference, Guimarães, Portugal, June 30, 2014

Accelerating Solid-fluid Interaction using Lattice-boltzmann and Immersed Boundary Coupled Simulations on Heterogeneous Platforms.
Proceedings of the International Conference on Computational Science, 2014

Multi-GPU acceleration of DARTEL (early detection of Alzheimer).
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
A GPU approach for accelerating 3D deformable registration (DARTEL) on brain biomedical images.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

GPU Powered ROSA Analyzer.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Analysis in performance and new model for multiple kernels executions on many-core architectures.
Proceedings of the IEEE 12th International Conference on Cognitive Informatics and Cognitive Computing, 2013

2012
Block Tridiagonal Solvers on Heterogeneous Architectures.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

MRF Satellite Image Classification on GPU.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Improving the Performance for the Range Search on Metric Spaces Using a Multi-GPU Platform.
Proceedings of the Database and Expert Systems Applications, 2012

2011
A GPU-based implementation of the MRF algorithm in ITK package.
J. Supercomput., 2011

Similarity search implementations for multi-core and many-core processors.
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

Towards a More Efficient Use of GPUs.
Proceedings of the International Conference on Computational Science and Its Applications, 2011

A GPU-Based Implementation for Range Queries on Spaghettis Data Structure.
Proceedings of the Computational Science and Its Applications - ICCSA 2011, 2011


  Loading...