Manuel F. Dolz

Enrique S. Quitana-Ortí

CoRR, May, 2026

StableGrad: Backward Scale Control without Batch Normalization.

[BibT_eX]

[DOI]

CoRR, May, 2026

FedOUI: OUI-Guided Client Weighting for Federated Aggregation.

[BibT_eX]

[DOI]

CoRR, May, 2026

OUI as a Structural Observable: Towards an Activation-Centric View of Neural Network Training.

[BibT_eX]

[DOI]

CoRR, May, 2026

OUIDecay: Adaptive Layer-wise Weight Decay for CNNs Using Online Activation Patterns.

[BibT_eX]

[DOI]

CoRR, May, 2026

Refresh-Scaling the Memory of Balanced Adam.

[BibT_eX]

[DOI]

CoRR, May, 2026

Cross-platform characterisation and performance analysis of homomorphic matrix multiplication.

[BibT_eX]

[DOI]

Franklin Espinoza

Justo Molina

Darwin Quezada-Gaibor

Sandra Catalán

J. Supercomput., April, 2026

FedSQ: Optimized Weight Averaging via Fixed Gating.

[BibT_eX]

[DOI]

CoRR, April, 2026

λ-GELU: Learning Gating Hardness for Controlled ReLU-ization in Deep Networks.

[BibT_eX]

[DOI]

CoRR, March, 2026

When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic.

[BibT_eX]

[DOI]

CoRR, March, 2026

Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training.

[BibT_eX]

[DOI]

CoRR, February, 2026

Why Adam Works Better with β<sub>1</sub> = β<sub>2</sub>: The Missing Gradient Scale Invariance Principle.

[BibT_eX]

[DOI]

CoRR, January, 2026

Assessing Modern Deep Vision Models for Chest X-Ray Diagnostics in Emergency Care.

[BibT_eX]

Ferran Soler-Guiral

Katty Delgado-Barriga

Proceedings of the 19th International Joint Conference on Biomedical Engineering Systems and Technologies, 2026

2025

GLAI: GreenLightningAI for Accelerated Training through Knowledge Decoupling.

[BibT_eX]

[DOI]

CoRR, October, 2025

Sinusoidal Initialization, Time for a New Start.

[BibT_eX]

[DOI]

CoRR, May, 2025

OUI Need to Talk About Weight Decay: A New Perspective on Overfitting Detection.

[BibT_eX]

[DOI]

CoRR, April, 2025

Deep learning inference optimisation for IoT: Conv2D-ReLU-BN layer fusion and quantisation.

[BibT_eX]

[DOI]

J. Supercomput., March, 2025

Ok-Topk-SP: A Novel Sparse Allreduce for Distributed Deep Neural Network Training on CPUs.

[BibT_eX]

[DOI]

Miguel Pardo-Navarro

Proceedings of the 24th International Symposium on Parallel and Distributed Computing, 2025

Analyzing Performance-Memory-Security Trade-Offs of Convolutions for DNN Inference on Homomorphically Encrypted Data.

[BibT_eX]

[DOI]

Núria Moreno-Chamorro

Maribel Castillo

Proceedings of the 24th International Symposium on Parallel and Distributed Computing, 2025

Enhanced ROI Selection in Deep Learning Heatmaps: Refining Pathology Detection in Chest X-ray Imaging.

[BibT_eX]

[DOI]

Núria Moreno-Chamorro

Maribel Castillo

Proceedings of the 3rd International Conference on Foundation and Large Language Models, 2025

Decoupling Structural and Quantitative Knowledge in ReLU-based Deep Neural Networks.

[BibT_eX]

[DOI]

José Cano

Proceedings of the 5th Workshop on Machine Learning and Systems, 2025

2024

Automatic generation of ARM NEON micro-kernels for matrix multiplication.

[BibT_eX]

[DOI]

J. Supercomput., July, 2024

Urban sound classification using neural networks on embedded FPGAs.

[BibT_eX]

[DOI]

J. Supercomput., June, 2024

Optimizing Convolutions for Deep Learning Inference on ARM Cortex-M Processors.

[BibT_eX]

[DOI]

IEEE Internet Things J., 2024

2023

Efficient and portable Winograd convolutions for multi-core processors.

[BibT_eX]

[DOI]

Antonio-Manuel Vidal-Maciá

J. Supercomput., July, 2023

Performance-energy trade-offs of deep learning convolution algorithms on ARM processors.

[BibT_eX]

[DOI]

Germán Fabregat

Andrés E. Tomás

J. Supercomput., June, 2023

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks.

[BibT_eX]

[DOI]

Mar Catalán

Computing, May, 2023

Using machine learning to model the training scalability of convolutional neural networks on clusters of GPUs.

[BibT_eX]

[DOI]

Computing, May, 2023

Reformulating the direct convolution for high-performance deep learning inference on ARM processors.

[BibT_eX]

[DOI]

Upasana Sridhar

Andrés E. Tomás

J. Syst. Archit., February, 2023

GreenLightningAI: An Efficient AI System with Decoupled Structural and Quantitative Knowledge.

[BibT_eX]

[DOI]

CoRR, 2023

2022

BestOf: an online implementation selector for the training and inference of deep neural networks.

[BibT_eX]

[DOI]

J. Supercomput., 2022

High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS.

[BibT_eX]

[DOI]

Sergio Barrachina

Pau San Juan

Andrés E. Tomás

J. Syst. Archit., 2022

Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors.

[BibT_eX]

[DOI]

Sergio Barrachina

Pablo San Juan

J. Parallel Distributed Comput., 2022

Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor.

[BibT_eX]

[DOI]

Héctor Martínez

Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP.

[BibT_eX]

[DOI]

Proceedings of the 30th Euromicro International Conference on Parallel, 2022

2021

PyDTNN: A user-friendly and extensible framework for distributed deep learning.

[BibT_eX]

[DOI]

J. Supercomput., 2021

Convolutional neural nets for estimating the run time and energy consumption of the sparse matrix-vector product.

[BibT_eX]

[DOI]

Maria Barreda

M. Asunción Castaño

Int. J. High Perform. Comput. Appl., 2021

Acoustic Echo Cancellation using Residual U-Nets.

[BibT_eX]

[DOI]

Julio Silva-Rodríguez

CoRR, 2021

High performance and energy efficient inference for deep learning on ARM processors.

[BibT_eX]

[DOI]

Sergio Barrachina

Pau San Juan

CoRR, 2021

Evaluation of MPI Allreduce for Distributed Training of Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 29th Euromicro International Conference on Parallel, 2021

Performance Modeling for Distributed Training of Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 29th Euromicro International Conference on Parallel, 2021

A Flexible Research-Oriented Framework for Distributed Training of Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020

Detecting semantic violations of lock-free data structures through C++ contracts.

[BibT_eX]

[DOI]

Javier López-Gómez

J. Supercomput., 2020

Performance modeling of the sparse matrix-vector product via convolutional neural networks.

[BibT_eX]

[DOI]

J. Supercomput., 2020

High Performance and Portable Convolution Operators for ARM-based Multicore Processors.

[BibT_eX]

[DOI]

CoRR, 2020

A pipeline for the QR update in digital signal processing.

[BibT_eX]

[DOI]

Comput. Math. Methods, 2020

High Performance and Portable Convolution Operators for Multicore Processors.

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

2019

A similarity study of I/O traces via string kernels.

[BibT_eX]

[DOI]

J. Supercomput., 2019

A pipeline structure for the block QR update in digital signal processing.

[BibT_eX]

[DOI]

J. Supercomput., 2019

Hybrid static-dynamic selection of implementation alternatives in heterogeneous environments.

[BibT_eX]

[DOI]

Javier García Blas

J. Supercomput., 2019

Exploring stream parallel patterns in distributed MPI environments.

[BibT_eX]

[DOI]

Javier López-Gómez

Javier Fernández Muñoz

Parallel Comput., 2019

Analysis of model parallelism for distributed neural networks.

[BibT_eX]

[DOI]

Proceedings of the 26th European MPI Users' Group Meeting, 2019

Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

2018

Understanding hardware and software metrics with respect to power consumption.

[BibT_eX]

[DOI]

Julian M. Kunkel

Sustain. Comput. Informatics Syst., 2018

Energy monitoring as an essential building block towards sustainable ultrascale systems.

[BibT_eX]

[DOI]

Francisco Almeida

Marcos Dias de Assunção

Sustain. Comput. Informatics Syst., 2018

Finding parallel patterns through static analysis in C++ applications.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2018

An adaptive offline implementation selector for heterogeneous parallel platforms.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2018

Paving the way towards high-level parallel pattern interfaces for data stream processing.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2018

Towards Automatic Parallelization of Stream Processing Applications.

[BibT_eX]

[DOI]

Jesús Carretero

IEEE Access, 2018

Supporting MPI-distributed stream parallel patterns in GrPPI.

[BibT_eX]

[DOI]

Javier Fernández Muñoz

Javier Prieto Cepeda

Proceedings of the 25th European MPI Users' Group Meeting, 2018

Parallelizing and Optimizing LHCb-Kalman for Intel Xeon Phi KNL Processors.

[BibT_eX]

[DOI]

Placido Fernández

Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Comparison of Clang Abstract Syntax Trees using String Kernels.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

2017

Adapting concurrency throttling and voltage-frequency scaling for dense eigensolvers.

[BibT_eX]

[DOI]

J. Supercomput., 2017

Enabling semantics to improve detection of data races and misuses of lock-free data structures.

[BibT_eX]

[DOI]

Massimo Torquati

Félix García Carballeira

Marco Danelutto

Concurr. Comput. Pract. Exp., 2017

A generic parallel pattern interface for stream and data processing.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

A Novel String Representation and Kernel Function for the Comparison of I/O Access Patterns.

[BibT_eX]

[DOI]

Proceedings of the Parallel Computing Technologies, 2017

Probabilistic-Based Selection of Alternate Implementations for Heterogeneous Platforms.

[BibT_eX]

[DOI]

Andrés Sánchez Cuadrado

Proceedings of the Algorithms and Architectures for Parallel Processing, 2017

Supporting Advanced Patterns in GrPPI, a Generic Parallel Pattern Interface.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2017: Parallel Processing Workshops, 2017

2016

Analyzing the energy consumption of the storage data path.

[BibT_eX]

[DOI]

Pablo Llopis

Francisco Javier García Blas

Florin Isaila

Mohammad Reza Heidari

Michael Kuhn

J. Supercomput., 2016

An analytical methodology to derive power models based on hardware and software metrics.

[BibT_eX]

[DOI]

Julian M. Kunkel

Konstantinos Chasapis

Sandra Catalán

Comput. Sci. Res. Dev., 2016

CID: A Compile-Time Implementation Decider for Heterogeneous Platforms Based on C++ Attributes.

[BibT_eX]

[DOI]

Luis Miguel Sánchez

Proceedings of the 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, 2016

Embedding Semantics of the Single-Producer/Single-Consumer Lock-Free Queue into a Race Detection Tool.

[BibT_eX]

[DOI]

Félix García Carballeira

Marco Danelutto

Massimo Torquati

Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016

Discovering Pipeline Parallel Patterns in Sequential Legacy C++ Codes.

[BibT_eX]

[DOI]

Luis Miguel Sánchez

Erick Jorge Canales-Rodríguez

Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016

Porting Matlab Applications to High-Performance C++ Codes: CPU/GPU-Accelerated Spherical Deconvolution of Diffusion MRI Data.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

A C++ Generic Parallel Pattern Interface for Stream Processing.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

2015

Evaluating the performance and energy efficiency of the COSMO-ART model system.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2015

Are our dense linear algebra libraries energy-friendly?

[BibT_eX]

[DOI]

Maria Barreda

Comput. Sci. Res. Dev., 2015

Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi.

[BibT_eX]

[DOI]

Comput. Electr. Eng., 2015

ARDUPOWER: A low-cost wattmeter to improve energy efficiency of HPC applications.

[BibT_eX]

[DOI]

Mohammad Reza Heidari

Michael Kuhn

Thomas Ludwig

Germán Fabregat

Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

2014

Energy-aware matrix computacion on multirhreaded architectures.

[BibT_eX]

[DOI]

PhD thesis, 2014

Assessing Power Monitoring Approaches for Energy and Power Analysis of Computers.

[BibT_eX]

[DOI]

Mohammed el Mehdi Diouri

Sustain. Comput. Informatics Syst., 2014

Tools and methods for measuring and tuning the energy efficiency of HPC systems.

[BibT_eX]

[DOI]

Sci. Program., 2014

Block pivoting implementation of a symmetric Toeplitz solver.

[BibT_eX]

[DOI]

Antonio M. Vidal

J. Parallel Distributed Comput., 2014

Automatic detection of power bottlenecks in parallel scientific applications.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2014

Modeling power and energy of the task-parallel Cholesky factorization on multicore processors.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2014

Modeling power and energy consumption of dense matrix factorizations on multicore processors.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2014

Enhancing performance and energy consumption of runtime schedulers for dense linear algebra.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2014

Assessing the impact of the CPU power-saving modes on the task-parallel solution of sparse linear systems.

[BibT_eX]

[DOI]

Clust. Comput., 2014

Evaluating Lustre's metadata server on a multi-socket platform.

[BibT_eX]

[DOI]

Konstantinos Chasapis

Michael Kuhn

Thomas Ludwig

Proceedings of the 9th Parallel Data Storage Workshop, 2014

2013

Energy-efficient execution of dense linear algebra algorithms on multi-core processors.

[BibT_eX]

[DOI]

Clust. Comput., 2013

Solving Some Mysteries in Power Monitoring of Servers: Take Care of Your Wattmeters!

[BibT_eX]

[DOI]

Mohammed el Mehdi Diouri

Proceedings of the Energy Efficiency in Large Scale Distributed Systems, 2013

Runtime Scheduling of the LU Factorization: Performance and Energy.

[BibT_eX]

[DOI]

Francisco D. Igual

Proceedings of the Energy Efficiency in Large Scale Distributed Systems, 2013

2012

A simulator to assess energy saving strategies and policies in HPC workloads.

[BibT_eX]

[DOI]

Juan Carlos Fernández

Sergio Iserte

ACM SIGOPS Oper. Syst. Rev., 2012

DVFS-control techniques for dense linear algebra operations on multi-core processors.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2012

Saving Energy in the LU Factorization with Partial Pivoting on Multi-core Processors.

[BibT_eX]

[DOI]

Proceedings of the 20th Euromicro International Conference on Parallel, 2012

Binding Performance and Power of Dense Linear Algebra Operations.

[BibT_eX]

[DOI]

Maria Barreda

Ruymán Reyes

Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Reducing Energy Consumption of Dense Linear Algebra Operations on Hybrid CPU-GPU Platforms.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners.

[BibT_eX]

[DOI]

Proceedings of the ICT as Key Technology against Global Warming, 2012

Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications.

[BibT_eX]

[DOI]

Ruymán Reyes

Proceedings of the 41st International Conference on Parallel Processing, 2012

2011

Power-aware Dense Linear Algebra Implementations on Multi-core and Many-core Processors.

[BibT_eX]

[DOI]

Robert A. van de Geijn

Proceedings of the 3rd Many-core Applications Research Community (MARC) Symposium. Proceedings of the 3rd MARC Symposium, 2011

Evaluation of the Energy Performance of Dense Linear Algebra Kernels on Multi-core and Many-Core Processors.

[BibT_eX]

[DOI]

Maribel Castillo

Juan Carlos Fernández

Vicente Roca

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Improving power efficiency of dense linear algebra algorithms on multi-core processors via slack control.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

2010

EnergySaving Cluster Roll: Power Saving System for Clusters.

[BibT_eX]

[DOI]

Juan Carlos Fernández