We stand with Ukraine

We stand with Ukraine

Adrián Castelló

Orcid: 0000-0002-8576-8451

Affiliations:

Universitat Politècnica de València, Spain
Universitat Jaume I de Castello, Spain (former)

According to our database¹, Adrián Castelló authored at least 66 papers between 2014 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2026

Sparse matrix-vector product on RISC-V processors with SIMD units.

[DOI]

Andrés E. Tomás

,

Héctor Martínez

,

Sandra Catalán

,

Patricia Siwinska

,

Adrián Castelló

,

,

Enrique S. Quintana-Ortí

Computing, May, 2026

Enhancing transformer performance and portability through auto-tuning frameworks.

[DOI]

Patricia Siwinska

,

,

Adrián Castelló

,

Pedro Alonso-Jordá

,

Enrique S. Quintana-Ortí

J. Supercomput., March, 2026

Enabling RISC-V Vector Code Generation in MLIR through Custom xDSL Lowerings.

[DOI]

,

Héctor Martínez

,

Adrián Castelló

CoRR, March, 2026

The cambrian explosion of mixed-precision matrix multiplication for quantized deep learning inference.

[DOI]

Héctor Martínez

,

Adrián Castelló

,

Francisco D. Igual

,

Enrique S. Quintana-Ortí

Future Gener. Comput. Syst., 2026

Migration of Ginkgo's Jacobi-Preconditioned CG Solver to Vector RISC-V.

[DOI]

Patricia Siwinska

,

Héctor Martínez Pérez

,

Adrián Castelló

Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops, 2026

2025

Latency-Critical Quantized Inference With Transformer Decoders on ARM and RISC-V CPUs.

[DOI]

Héctor Martínez

,

Sandra Catalán

,

Adrián Castelló

,

José I. Mestre

,

Enrique S. Quintana-Ortí

IEEE Internet Things J., July, 2025

Experience-guided, mixed-precision matrix multiplication with apache TVM for ARM processors.

[DOI]

Adrián Castelló

,

Héctor Martínez

,

Sandra Catalán

,

Francisco D. Igual

,

Enrique S. Quintana-Ortí

J. Supercomput., January, 2025

Characterization of quantized inference with transformer encoders on low power CPUs.

[DOI]

Héctor Martínez

,

Sandra Catalán

,

Adrián Castelló

,

Enrique S. Quintana-Ortí

Int. J. High Perform. Comput. Appl., 2025

Generation of Mixed-Precision Kernels for Quantized Transformer Encoders with Exo.

[DOI]

Adrián Castelló

,

Héctor Martínez

,

Francisco D. Igual

,

Enrique S. Quintana-Ortí

Proceedings of the High Performance Computing, 2025

Evaluation of RVV-Enabled COTS Platforms with Matrix Multiplication and Exo.

[DOI]

Adrián Castelló

,

Héctor Martínez

,

Sandra Catalán

,

Francisco D. Igual

,

Enrique S. Quintana-Ortí

Proceedings of the High Performance Computing, 2025

Portable, High Performance Matrix Multiplication Micro-Kernels for RISC-V with ExO.

[DOI]

Adrián Castelló

,

Héctor Martínez

,

Sandra Catalán

,

,

,

,

Francisco D. Igual

,

Enrique S. Quintana-Ortí

Proceedings of the 33rd Euromicro International Conference on Parallel, 2025

2024

Communication-Avoiding Fusion of GEMM-Based Convolutions for Deep Learning in the RISC-V GAP8 MCU.

[DOI]

Cristián Ramírez

,

Adrián Castelló

,

Héctor Martínez

,

Enrique S. Quintana-Ortí

IEEE Internet Things J., November, 2024

Automatic generation of ARM NEON micro-kernels for matrix multiplication.

[DOI]

Guillermo Alaejos

,

Héctor Martínez

,

Adrián Castelló

,

,

Francisco D. Igual

,

Pedro Alonso-Jordá

,

Enrique S. Quintana-Ortí

J. Supercomput., July, 2024

Parallel GEMM-based convolution for deep learning on multicore RISC-V processors.

[DOI]

Cristián Ramírez

,

Adrián Castelló

,

Héctor Martínez

,

Enrique S. Quintana-Ortí

J. Supercomput., June, 2024

Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.

[DOI]

Guillermo Alaejos

,

Adrián Castelló

,

Pedro Alonso-Jordá

,

Francisco D. Igual

,

Héctor Martínez

,

Enrique S. Quintana-Ortí

ACM Trans. Math. Softw., March, 2024

RED-SEA Project: Towards a new-generation European interconnect.

[DOI]

María Engracia Gómez

,

Julio Sahuquillo

,

Andrea Biagioni

,

,

,

Ottorino Frezza

,

Francesca Lo Cicero

,

Alessandro Lonardo

,

Michele Martinelli

,

Pier Stanislao Paolucci

,

Elena Pastorelli

,

Francesco Simula

,

Matteo Turisini

,

,

Roberto Ammendola

,

Carlotta Chiarini

,

,

Fabrizio Capuani

,

Adrián Castelló

,

,

Eugenio Stabile

,

Enrique S. Quintana-Ortí

,

Pascale Bernier-Bruna

,

,

Pierre-Axel Lagadec

,

Gregoire Pichon

,

,

Manolis Katevenis

,

Sokratis Bartzis

,

Orestis Mousouros

,

Pantelis Xirouchakis

,

Vangelis Mageiropoulos

,

Michalis Gianioudis

,

,

Aggelos Ioannou

,

Nikos Kallimanis

,

Miguel Sánchez de la Rosa

,

Gabriel Gomez-Lopez

,

Francisco Alfaro-Cortés

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

José L. Sánchez

,

Gaetan De Gassowski

,

Matthieu Hautreaux

,

Stephane Mathieu

,

,

,

,

Torsten Hoefler

,

,

,

Giuseppe Piero Brandino

,

Francesco De Giorgi

,

,

Iakovos Mavroidis

,

Yannis Papaefstathiou

,

Nikolaos Tampouratzis

,

Benjamin Kalisch

,

Ulrich Krackhardt

,

Mondrian Nuessle

,

Wolfgang Frings

,

Dominik Gottwald

,

Felime Guimaraes

,

,

,

,

,

,

,

Jennifer Lopez Barillao

,

,

Microprocess. Microsystems, 2024

Parallel GEMM-based convolutions for deep learning on multicore ARM and RISC-V architectures.

[DOI]

Héctor Martínez

,

Sandra Catalán

,

Adrián Castelló

,

Enrique S. Quintana-Ortí

J. Syst. Archit., 2024

Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors.

[DOI]

Rafael Rodríguez-Sánchez

,

Adrián Castelló

,

Sandra Catalán

,

Francisco D. Igual

,

Enrique S. Quintana-Ortí

Int. J. High Perform. Comput. Appl., 2024

Performance Analysis of BERT on RISC-V Processors with SIMD Units.

[DOI]

Héctor Martínez

,

Sandra Catalán

,

,

Francisco D. Igual

,

Rafael Rodríguez-Sánchez

,

Adrián Castelló

,

Enrique S. Quintana-Ortí

Proceedings of the High Performance Computing. ISC High Performance 2024 International Workshops, 2024

Optimization of One-to-Many Communication Primitives for Dragonfly Topologies.

[DOI]

,

Adrián Castelló

,

María Engracia Gómez

,

Julio Sahuquillo

,

Enrique S. Quintana

Proceedings of the 30th IEEE International Conference on Parallel and Distributed Systems, 2024

Inference with Transformer Encoders on ARM and RISC-V Multicore Processors.

[DOI]

Héctor Martínez

,

Francisco D. Igual

,

Rafael Rodríguez-Sánchez

,

Sandra Catalán

,

Adrián Castelló

,

Enrique S. Quintana-Ortí

Proceedings of the Euro-Par 2024: Parallel Processing, 2024

One-to-Many Communication Primitives in Dragonfly Networks.

[DOI]

,

Adrián Castelló

,

María Engracia Gómez

,

Julio Sahuquillo

,

Enrique S. Quintana-Ortí

Proceedings of the Euro-Par 2024: Parallel Processing Workshops, 2024

QAttn: Efficient GPU Kernels for mixed-precision Vision Transformers.

[DOI]

,

Adrián Castelló

,

Florian Scheidegger

,

A. Cristiano I. Malossi

,

Enrique S. Quintana-Ortí

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Tackling the Matrix Multiplication Micro-Kernel Generation with Exo.

[DOI]

Adrián Castelló

,

Julian Bellavita

,

,

,

Héctor Martínez

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023

Efficient and portable Winograd convolutions for multi-core processors.

[DOI]

,

Héctor Martínez

,

Adrián Castelló

,

Pedro Alonso-Jordá

,

Enrique S. Quintana-Ortí

J. Supercomput., July, 2023

Performance-energy trade-offs of deep learning convolution algorithms on ARM processors.

[DOI]

,

Sergio Barrachina

,

Héctor Martínez

,

Adrián Castelló

,

Antonio-Manuel Vidal-Maciá

,

Germán Fabregat

,

Andrés E. Tomás

J. Supercomput., June, 2023

Micro-kernels for portable and efficient matrix multiplication in deep learning.

[DOI]

Guillermo Alaejos

,

Adrián Castelló

,

Héctor Martínez

,

Pedro Alonso-Jordá

,

Francisco D. Igual

,

Enrique S. Quintana-Ortí

J. Supercomput., May, 2023

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks.

[DOI]

Adrián Castelló

,

,

,

Enrique S. Quintana-Ortí

,

Computing, May, 2023

Using machine learning to model the training scalability of convolutional neural networks on clusters of GPUs.

[DOI]

Sergio Barrachina

,

Adrián Castelló

,

,

,

José I. Mestre

Computing, May, 2023

Reformulating the direct convolution for high-performance deep learning inference on ARM processors.

[DOI]

Sergio Barrachina

,

Adrián Castelló

,

,

,

Héctor Martínez

,

Enrique S. Quintana-Ortí

,

Upasana Sridhar

,

Andrés E. Tomás

J. Syst. Archit., February, 2023

Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.

[DOI]

Guillermo Alaejos

,

Adrián Castelló

,

Pedro Alonso-Jordá

,

Francisco D. Igual

,

Héctor Martínez

,

Enrique S. Quintana-Ortí

CoRR, 2023

Automatic Generation of Micro-kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors.

[DOI]

Francisco D. Igual

,

,

Sandra Catalán

,

Héctor Martínez

,

Adrián Castelló

,

Enrique S. Quintana-Ortí

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2022

A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor.

[DOI]

Cristián Ramírez

,

Adrián Castelló

,

Enrique S. Quintana-Ortí

J. Supercomput., 2022

BestOf: an online implementation selector for the training and inference of deep neural networks.

[DOI]

Sergio Barrachina

,

Adrián Castelló

,

,

Andrés E. Tomás

J. Supercomput., 2022

High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS.

[DOI]

Adrián Castelló

,

Sergio Barrachina

,

,

Enrique S. Quintana-Ortí

,

,

Andrés E. Tomás

J. Syst. Archit., 2022

Performance Analysis of Matrix Multiplication for Deep Learning on the Edge.

[DOI]

Cristián Ramírez

,

Adrián Castelló

,

Héctor Martínez

,

Enrique S. Quintana-Ortí

Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022

QR Factorization Using Malleable BLAS on Multicore Processors.

[DOI]

Adrián Castelló

,

Sandra Catalán

,

Francisco D. Igual

,

Enrique S. Quintana-Ortí

,

Rafael Rodríguez-Sánchez

Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022

Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP.

[DOI]

,

Adrián Castelló

,

Enrique S. Quintana-Ortí

Proceedings of the 30th Euromicro International Conference on Parallel, 2022

Anatomy of the BLIS Family of Algorithms for Matrix Multiplication.

[DOI]

Adrián Castelló

,

Enrique S. Quintana-Ortí

,

Francisco D. Igual

Proceedings of the 30th Euromicro International Conference on Parallel, 2022

RED-SEA: Network Solution for Exascale Architectures.

[DOI]

Andrea Biagioni

,

,

Ottorino Frezza

,

Francesca Lo Cicero

,

Alessandro Lonardo

,

Michele Martinelli

,

Pier Stanislao Paolucci

,

Elena Pastorelli

,

Francesco Simula

,

Matteo Turisini

,

,

Roberto Ammendola

,

Pascale Bernier-Bruna

,

,

,

,

Pierre-Axel Lagadec

,

Gregoire Pichon

,

,

Gaetan De Gassowski

,

Matthieu Hautreaux

,

Stephane Mathieu

,

,

,

,

Torsten Hoefler

,

,

,

Giuseppe Piero Brandino

,

Francesco De Giorgi

,

,

Iakovos Mavroidis

,

Yannis Papaefstathiou

,

Nikolaos Tampouratzis

,

Benjamin Kalisch

,

Ulrich Krackhardt

,

Mondrian Nuessle

,

Pantelis Xirouchakis

,

Vangelis Mageiropoulos

,

Michalis Gianioudis

,

,

Aggelos Ioannou

,

Nikos Kallimanis

,

,

Manolis Katevenis

,

Wolfgang Frings

,

Dominik Gottwald

,

Felime Guimaraes

,

,

,

,

,

,

,

Jennifer Lopez Barillao

,

,

,

Francisco J. Alfaro

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

José L. Sánchez

,

Adrián Castelló

,

,

María Engracia Gómez

,

Enrique S. Quintana-Ortí

,

Julio Sahuquillo

,

Eugenio Stabile

Proceedings of the 25th Euromicro Conference on Digital System Design, 2022

2021

PyDTNN: A user-friendly and extensible framework for distributed deep learning.

[DOI]

Sergio Barrachina

,

Adrián Castelló

,

,

,

José I. Mestre

J. Supercomput., 2021

Acoustic Echo Cancellation using Residual U-Nets.

[DOI]

Julio Silva-Rodríguez

,

,

,

Adrián Castelló

,

,

CoRR, 2021

High performance and energy efficient inference for deep learning on ARM processors.

[DOI]

Adrián Castelló

,

Sergio Barrachina

,

,

Enrique S. Quintana-Ortí

,

CoRR, 2021

Accelerating distributed deep neural network training with pipelined MPI allreduce.

[DOI]

Adrián Castelló

,

Enrique S. Quintana-Ortí

,

Clust. Comput., 2021

Evaluation of MPI Allreduce for Distributed Training of Convolutional Neural Networks.

[DOI]

Adrián Castelló

,

,

,

José I. Mestre

,

Enrique S. Quintana-Ortí

,

Proceedings of the 29th Euromicro International Conference on Parallel, 2021

Performance Modeling for Distributed Training of Convolutional Neural Networks.

[DOI]

Adrián Castelló

,

,

,

José I. Mestre

,

Enrique S. Quintana-Ortí

,

Proceedings of the 29th Euromicro International Conference on Parallel, 2021

A Flexible Research-Oriented Framework for Distributed Training of Deep Neural Networks.

[DOI]

Sergio Barrachina

,

Adrián Castelló

,

,

,

José I. Mestre

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020

Analysis of Threading Libraries for High Performance Computing.

[DOI]

Adrián Castelló

,

Rafael Mayo Gual

,

,

,

Enrique S. Quintana-Ortí

,

Antonio J. Peña

IEEE Trans. Computers, 2020

High Performance and Portable Convolution Operators for ARM-based Multicore Processors.

[DOI]

,

Adrián Castelló

,

,

Pedro Alonso-Jordá

,

Enrique S. Quintana-Ortí

CoRR, 2020

Programming parallel dense matrix factorizations with look-ahead and OpenMP.

[DOI]

Sandra Catalán

,

Adrián Castelló

,

Francisco D. Igual

,

Rafael Rodríguez-Sánchez

,

Enrique S. Quintana-Ortí

Clust. Comput., 2020

High Performance and Portable Convolution Operators for Multicore Processors.

[DOI]

,

Adrián Castelló

,

,

Pedro Alonso-Jordá

,

Enrique S. Quintana-Ortí

Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

2019

Analysis of model parallelism for distributed neural networks.

[DOI]

Adrián Castelló

,

,

Enrique S. Quintana-Ortí

,

Proceedings of the 26th European MPI Users' Group Meeting, 2019

Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks.

[DOI]

Adrián Castelló

,

,

Enrique S. Quintana-Ortí

,

Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

2018

Unification of Lightweight Thread Solutions and their Application in High Performance Programming.

[DOI]

Adrián Castelló

PhD thesis, 2018

Argobots: A Lightweight Low-Level Threading and Tasking Framework.

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models.

[DOI]

Adrián Castelló

,

Antonio J. Peña

,

,

,

Enrique S. Quintana-Ortí

,

J. Supercomput., 2018

On the adequacy of lightweight thread approaches for high-level parallel programming models.

[DOI]

Adrián Castelló

,

,

,

Vicenç Beltran

,

,

Antonio J. Peña

Future Gener. Comput. Syst., 2018

2017

GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations.

[DOI]

Adrián Castelló

,

,

,

,

Enrique S. Quintana-Ortí

,

Antonio J. Peña

Proceedings of the 46th International Conference on Parallel Processing, 2017

GLT: A Unified API for Lightweight Thread Libraries.

[DOI]

Adrián Castelló

,

,

,

,

Enrique S. Quintana-Ortí

,

Antonio J. Peña

Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016

A Review of Lightweight Thread Approaches for High Performance Computing.

[DOI]

Adrián Castelló

,

Antonio J. Peña

,

,

,

,

Enrique S. Quintana-Ortí

Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

Enabling GPU Virtualization in Cloud Environments.

[DOI]

,

Francisco J. Clemente-Castelló

,

Adrián Castelló

,

,

Enrique S. Quintana-Ortí

Proceedings of the CLOSER 2016, 2016

2015

Improving the user experience of the rCUDA remote GPU virtualization framework.

[DOI]

,

,

Adrián Castelló

,

Antonio J. Peña

,

,

Enrique S. Quintana-Ortí

,

Concurr. Comput. Pract. Exp., 2015

Exploiting Task-Parallelism on GPU Clusters via OmpSs and rCUDA Virtualization.

[DOI]

Adrián Castelló

,

,

,

Enrique S. Quintana-Ortí

Proceedings of the 2015 IEEE TrustCom/BigDataSE/ISPA, 2015

Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA.

[DOI]

Adrián Castelló

,

Antonio J. Peña

,

,

,

Enrique S. Quintana-Ortí

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

SLURM Support for Remote GPU Virtualization: Implementation and Performance Study.

[DOI]

,

Adrián Castelló

,

,

Enrique S. Quintana-Ortí

,

,

,

,

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Boosting the performance of remote GPU virtualization using InfiniBand connect-IB and PCIe 3.0.

[DOI]

,

,

Antonio J. Peña

,

,

,

Adrián Castelló

,

Enrique S. Quintana-Ortí

,

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

Loading...