Basilio B. Fraguela

Orcid: 0000-0002-3438-5960

Affiliations:
  • University of A Coruña, Spain


According to our database1, Basilio B. Fraguela authored at least 96 papers between 1995 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
VENOM: A Vectorized N: M Format for Unleashing the Power of Sparse Tensor Cores.
Proceedings of the International Conference for High Performance Computing, 2023

2022
A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clusters.
J. Supercomput., 2022

The New UPC++ DepSpawn High Performance Library for Data-Flow Computing with Hybrid Parallelism.
Proceedings of the Computational Science - ICCS 2022, 2022

Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM Routine on Ampere GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
A software cache autotuning strategy for dataflow computing with UPC++ DepSpawn.
Comput. Math. Methods, November, 2021

High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn.
J. Supercomput., 2021

A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems.
Int. J. Parallel Program., 2021

ScalaParBiBit: scaling the binary biclustering in distributed-memory systems.
Clust. Comput., 2021

2020
An automatic optimizer for heterogeneous devices.
Future Gener. Comput. Syst., 2020

Reusing Trained Layers of Convolutional Neural Networks to Shorten Hyperparameters Tuning Time.
CoRR, 2020

A Hybrid Approach for Tracking Individual Players in Broadcast Match Videos.
CoRR, 2020

2019
Easy Dataflow Programming in Clusters with UPC++ DepSpawn.
IEEE Trans. Parallel Distributed Syst., 2019

Portable and efficient FFT and DCT algorithms with the Heterogeneous Butterfly Processing Library.
J. Parallel Distributed Comput., 2019

Enhanced global optimization methods applied to complex fisheries stock assessment models.
Appl. Soft Comput., 2019

A Fast Solver for Large Tridiagonal Systems on Multi-Core Processors (Lass Library).
IEEE Access, 2019

2018
Heterogeneous distributed computing based on high-level abstractions.
Concurr. Comput. Pract. Exp., 2018

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model.
Proceedings of the Computational Science - ICCS 2018, 2018

2017
Accelerating the HyperLogLog Cardinality Estimation Algorithm.
Sci. Program., 2017

High productivity multi-device exploitation with the Heterogeneous Programming Library.
J. Parallel Distributed Comput., 2017

A portable and adaptable fault tolerance solution for heterogeneous applications.
J. Parallel Distributed Comput., 2017

Facilitating the development of stencil applications using the Heterogeneous Programming Library.
Concurr. Comput. Pract. Exp., 2017

A general and efficient divide-and-conquer algorithm framework for multi-core clusters.
Clust. Comput., 2017

A Comparison of Task Parallel Frameworks based on Implicit Dependencies in Multi-core Environments.
Proceedings of the 50th Hawaii International Conference on System Sciences, 2017

2016
Writing a performance-portable matrix multiplication.
Parallel Comput., 2016

Novel parallelization of simulated annealing and Hooke & Jeeves search algorithms for multicore systems with application to complex fisheries stock assessment models.
J. Comput. Sci., 2016

Towards a High Level Approach for the Programming of Heterogeneous Clusters.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

GPU Accelerated Molecular Docking Simulation with Genetic Algorithms.
Proceedings of the Applications of Evolutionary Computation - 19th European Conference, 2016

2015
Developing adaptive multi-device applications with the Heterogeneous Programming Library.
J. Supercomput., 2015

On Processing Extreme Data.
Scalable Comput. Pract. Exp., 2015

Automatic Generation of Optimized OpenCL Codes Using OCLoptimizer.
Comput. J., 2015

Enhancing and Evaluating the Configuration Capability of a Skeleton for Irregular Computations.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Improving OpenCL Programmability with the Heterogeneous Programming Library.
Proceedings of the International Conference on Computational Science, 2015

2014
Address independent estimation of the boundaries of cache performance.
Microprocess. Microsystems, 2014

An Algorithm Template for Domain-Based Parallel Irregular Algorithms.
Int. J. Parallel Program., 2014

A fine-grained thread-aware management policy for shared caches.
Concurr. Comput. Pract. Exp., 2014

Writing Self-adaptive Codes for Heterogeneous Systems.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

2013
Numerical simulation of pollutant transport in a shallow-water system on the Cell heterogeneous processor.
J. Supercomput., 2013

Virtually split cache: An efficient mechanism to distribute instructions and data.
ACM Trans. Archit. Code Optim., 2013

A framework for argument-based task synchronization with automatic detection of dependencies.
Parallel Comput., 2013

Accurate prediction of the behavior of multithreaded applications in shared caches.
Parallel Comput., 2013

Exploiting heterogeneous parallelism with the Heterogeneous Programming Library.
J. Parallel Distributed Comput., 2013

Parallelization of shallow water simulations on current multi-threaded systems.
Int. J. High Perform. Comput. Appl., 2013

A multi-GPU shallow-water simulation with transport of contaminants.
Concurr. Comput. Pract. Exp., 2013

Graphics processing unit computing and exploitation of hardware accelerators.
Concurr. Comput. Pract. Exp., 2013

OCLoptimizer: An Iterative Optimization Tool for OpenCL.
Proceedings of the International Conference on Computational Science, 2013

2012
Static analysis of the worst-case memory performance for irregular codes with indirections.
ACM Trans. Archit. Code Optim., 2012

Optimization techniques for efficient HTA programs.
Parallel Comput., 2012

Special issue editorial: Exploitation of hardware accelerators.
Microprocess. Microsystems, 2012

Special issue editorial: Accelerators for high-performance computing.
J. Parallel Distributed Comput., 2012

Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite.
Comput. Electr. Eng., 2012

Using an Analytical Model of Shared Caches for Selecting the Optimal Parallelization Scheme.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

A Portable High-Productivity Approach to Program Heterogeneous Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Adaptive Set-Granular Cooperative Caching.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

2011
An efficient parallel set container for multicore architectures.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Simulation of pollutant transport in shallow water on a CUDA architecture.
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

2010
Address-Independent Estimation of the Worst-case Memory Performance.
IEEE Trans. Ind. Informatics, 2010

Servet: A benchmark suite for autotuning on multicore clusters.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

A Generic Algorithm Template for Divide-and-Conquer in Multicore Systems.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Reducing capacity and conflict misses using Set Saturation Levels.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

Streaming-Oriented Parallelization of Domain-Independent Irregular Kernels.
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

2009
Writing productive stencil codes with overlapped tiling.
Concurr. Comput. Pract. Exp., 2009

Static Prediction of Worst-Case Data Cache Performance in the Absence of Base Address Information.
Proceedings of the 15th IEEE Real-Time and Embedded Technology and Applications Symposium, 2009

Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

Task-Parallel versus Data-Parallel Library-Based Programming in Multicore Systems.
Proceedings of the 17th Euromicro International Conference on Parallel, 2009

Adaptive line placement with the <i>set balancing cache</i>.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Performance Evaluation of Unified Parallel C Collective Communications.
Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling.
Proceedings of the PACT 2009, 2009

2008
Design Issues in Parallel Array Languages for Shared Memory.
Proceedings of the Embedded Computer Systems: Architectures, 2008

Programming with tiles.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

2007
Precise automatable analytical modeling of the cache behavior of codes with indirections.
ACM Trans. Archit. Code Optim., 2007

Special Issue: Current Trends in Compilers for Parallel Computers.
Concurr. Comput. Pract. Exp., 2007

Automated and accurate cache behavior analysis for codes with irregular access patterns.
Concurr. Comput. Pract. Exp., 2007

2006
Analytical modeling of codes with arbitrary data-dependent conditional structures.
J. Syst. Archit., 2006

Programming for parallelism and locality with hierarchically tiled arrays.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006

Design and Use of htalib - A Library for Hierarchically Tiled Arrays.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

Cache Behavior Modelling for Codes Involving Banded Matrices.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

Hierarchically tiled arrays for parallelism and locality.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2005
Optimal Tile Size Selection Guided by Analytical Models.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

2004
A compiler tool to predict memory hierarchy performance of scientific codes.
Parallel Comput., 2004

The Hierarchically Tiled Arrays programming approach.
Proceedings of the 7th Workshop on languages, 2004

Implementation of Parallel Numerical Algorithms Using Hierarchically Tiled Arrays.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Modeling the Cache Behavior of Codes with Arbitrary Data-Dependent Conditional Structures.
Proceedings of the Advances in Computer Systems Architecture, 9th Asia-Pacific Conference, 2004

2003
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance.
IEEE Trans. Computers, 2003

Cache Behavior Modeling of Codes with Data-Dependent Conditionals.
Proceedings of the Software and Compilers for Embedded Systems, 7th International Workshop, 2003

Programming the FlexRAM parallel intelligent memory system.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

Programming for Locality and Parallelism with Hierarchically Tiled Arrays.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

1999
Memory Hierarchy Performance Prediction for Blocked Sparse Algorithms.
Parallel Process. Lett., 1999

Direct mapped cache performance modeling for sparse matrix operations.
Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99, 1999

Set Associative Cache Behavior Optimization.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

Automatic Analytical Modeling for the Estimation of Cache Misses.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998
Modeling Set Associative Caches Behavior for Irregular Computations.
Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, 1998

Cache Misses Prediction for High Performance Sparse Algorithms.
Proceedings of the Euro-Par '98 Parallel Processing, 1998

Cache Probabilistic Modeling for Basic Sparse Algebra Kernels Involving Matrices with a Non Uniform Distribution.
Proceedings of the 24th EUROMICRO '98 Conference, 1998

1996
Evaluation of vectorization/parallelization techniques: application to nonparametric curve estimation.
Stat. Comput., 1996

Parallel Sparse Modified Gram-Schmidt QR Decomposition.
Proceedings of the High-Performance Computing and Networking, 1996

1995
Extending CAML Light to Perform Distributed Computation.
Proceedings of the 1995 Joint Conference on Declarative Programming, 1995


  Loading...