Francisco D. Igual

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020

Resource Management for Power-Constrained HEVC Transcoding Using Reinforcement Learning.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Integration and exploitation of intra-routine malleability in BLIS.

[BibT_eX]

[DOI]

J. Supercomput., 2020

STEEL-RT: combining single task-single executor model and expanded scheduling to ease heterogeneity exploitation.

[BibT_eX]

[DOI]

Antón Rey

J. Supercomput., 2020

Leveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding.

[BibT_eX]

[DOI]

J. Supercomput., 2020

Programming parallel dense matrix factorizations with look-ahead and OpenMP.

[BibT_eX]

[DOI]

Adrián Castelló

Clust. Comput., 2020

Towards a Malleable Tensorflow Implementation.

[BibT_eX]

[DOI]

Leandro Ariel Libutti

Proceedings of the Cloud Computing, Big Data & Emerging Topics - 8th Conference, 2020

2019

Algorithm 994: Fast Implementations of the Brouwer-Zimmermann Algorithm for the Computation of the Minimum Distance of a Random Linear Code.

[BibT_eX]

[DOI]

Fernando Hernando

ACM Trans. Math. Softw., 2019

Variable intra-task threading for power-constrained performance and energy optimization in DAG scheduling.

[BibT_eX]

[DOI]

Antón Rey

J. Supercomput., 2019

Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL.

[BibT_eX]

[DOI]

J. Supercomput., 2019

Portability Study of an OpenCL Algorithm for Automatic Target Detection in Hyperspectral Images.

[BibT_eX]

[DOI]

IEEE Trans. Geosci. Remote. Sens., 2019

Practical Considerations for Acoustic Source Localization in the IoT Era: Platforms, Energy Efficiency, and Performance.

[BibT_eX]

[DOI]

IEEE Internet Things J., 2019

Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Multicomputers.

[BibT_eX]

[DOI]

Fernando Hernando

CoRR, 2019

Detecting Time-Fragmented Cache Attacks Against AES Using Performance Monitoring Counters.

[BibT_eX]

[DOI]

Iván Prada

José Manuel Badía-Contelles

Katzalin Olcoz

Proceedings of the 7th Conference on Cloud Computing & Big Data, 2019

MAMUT: Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-User Video Transcoding.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

2018

Optimized Fundamental Signal Processing Operations For Energy Minimization on Heterogeneous Mobile Devices.

[BibT_eX]

[DOI]

Jose A. Belloch

Alberto González

IEEE Trans. Circuits Syst. I Regul. Pap., 2018

Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors.

[BibT_eX]

[DOI]

José R. Herrero

Chris Adeniyi-Jones

J. Comput. Sci., 2018

Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big. LITTLE asymmetric architectures.

[BibT_eX]

[DOI]

Int. J. Circuit Theory Appl., 2018

2017

Time and energy modeling of a high-performance multi-threaded Cholesky factorization.

[BibT_eX]

[DOI]

J. Supercomput., 2017

Solving Weighted Least Squares (WLS) problems on ARM-based architectures.

[BibT_eX]

[DOI]

Jose A. Belloch

Balázs Bank

Antonio M. Vidal

J. Supercomput., 2017

Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations.

[BibT_eX]

[DOI]

Parallel Comput., 2017

Performance-Power Evaluation of an OpenCL Implementation of the Simplex Growing Algorithm for Hyperspectral Unmixing.

[BibT_eX]

[DOI]

IEEE Geosci. Remote. Sens. Lett., 2017

Energy Efficiency Optimization of Task-Parallel Codes on Asymmetric Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Performance and Scalability Study of FMM Kernels on Novel Multi- and Many-core Architectures.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2017

On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2017

2016

The BLIS Framework: Experiments in Portability.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2016

Analytical Modeling Is Enough for High-Performance BLIS.

[BibT_eX]

[DOI]

Tze Meng Low

Tyler M. Smith

ACM Trans. Math. Softw., 2016

Fast Algorithms for the Computation of the Minimum Distance of a Random Linear Code.

[BibT_eX]

[DOI]

Fernando Hernando

CoRR, 2016

Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors.

[BibT_eX]

[DOI]

Clust. Comput., 2016

Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

HeSP: A Simulation Framework for Solving the Task Scheduling-Partitioning Problem on Heterogeneous Architectures.

[BibT_eX]

[DOI]

Antón Rey

Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015

Time and energy modeling of high-performance Level-3 BLAS on x86 architectures.

[BibT_eX]

[DOI]

Simul. Model. Pract. Theory, 2015

Speeding up the log-polar transform with inexpensive parallel hardware: graphics units and multi-core architectures.

[BibT_eX]

[DOI]

J. Real Time Image Process., 2015

Accelerating fluid-solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures.

[BibT_eX]

[DOI]

J. Comput. Sci., 2015

A power measurement environment for PCIe accelerators.

[BibT_eX]

[DOI]

Luis M. Jara

José Ignacio Gómez Pérez

Luis Piñuel

Comput. Sci. Res. Dev., 2015

Revisiting Conventional Task Schedulers to Exploit Asymmetry in ARM big.LITTLE Architectures for Dense Linear Algebra.

[BibT_eX]

[DOI]

Luis Costero

Katzalin Olcoz

CoRR, 2015

Performance and Energy Optimization of Matrix Multiplication on Asymmetric big.LITTLE Processors.

[BibT_eX]

[DOI]

CoRR, 2015

Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors.

[BibT_eX]

[DOI]

José R. Herrero

CoRR, 2015

Non-negative Matrix Factorization on Low-Power Architectures and Accelerators: A Comparative Study.

[BibT_eX]

[DOI]

Comput. Electr. Eng., 2015

Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi.

[BibT_eX]

[DOI]

Comput. Electr. Eng., 2015

Vectorization of binaural sound virtualization on the ARM Cortex-A15 architecture.

[BibT_eX]

[DOI]

Proceedings of the 23rd European Signal Processing Conference, 2015

2014

Hyperspectral Unmixing on Multicore DSPs: Trading Off Performance for Energy.

[BibT_eX]

[DOI]

Maribel Castillo

Juan Carlos Fernández

Antonio Plaza

Alfredo Remón

IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2014

Enhancing performance and energy consumption of runtime schedulers for dense linear algebra.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2014

Author's retrospective for biomedical image analysis on a cooperative cluster of gpus and multicores.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Parallel performance and energy efficiency of modern video encoders on multithreaded architectures.

[BibT_eX]

[DOI]

José Luis Martínez

Francisco Daniel Igual Peña

Proceedings of the 22nd European Signal Processing Conference, 2014

2013

Matrix computations on graphics processors and clusters of gpus

[BibT_eX]

[DOI]

PhD thesis, 2013

Robust motion estimation on a low-power multi-core DSP.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2013

Scheduling algorithms-by-blocks on small clusters.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2013

Non-negative matrix factorization on low-power architectures: a comparative study.

[BibT_eX]

[DOI]

Proceedings of the 20th European MPI Users's Group Meeting, 2013

Runtime Scheduling of the LU Factorization: Performance and Energy.

[BibT_eX]

[DOI]

Pedro Alonso

Manuel F. Dolz

Proceedings of the Energy Efficiency in Large Scale Distributed Systems, 2013

2012

A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures.

[BibT_eX]

[DOI]

Mercedes Marqués

ACM Trans. Math. Softw., 2012

The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations.

[BibT_eX]

[DOI]

Ernie Chan

Field G. Van Zee

J. Parallel Distributed Comput., 2012

Color and texture analysis on emerging parallel architectures.

[BibT_eX]

[DOI]

Ümit V. Çatalyürek

Antonio Ruiz

Manuel Ujaldon

Int. J. High Perform. Comput. Appl., 2012

Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors.

[BibT_eX]

[DOI]

Hartwig Anzt

Maribel Castillo

Juan Carlos Fernández

Vincent Heuveline

Comput. Sci. Res. Dev., 2012

DVFS-control techniques for dense linear algebra operations on multi-core processors.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2012

Solving dense generalized eigenproblems on multi-threaded architectures.

[BibT_eX]

[DOI]

Appl. Math. Comput., 2012

Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Level-3 BLAS on the TI C6678 Multi-core DSP.

[BibT_eX]

[DOI]

Murtaza Ali

Eric Stotzer

Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Saving Energy in the LU Factorization with Partial Pivoting on Multi-core Processors.

[BibT_eX]

[DOI]

Proceedings of the 20th Euromicro International Conference on Parallel, 2012

Reducing Energy Consumption of Dense Linear Algebra Operations on Hybrid CPU-GPU Platforms.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

2011

Color and texture analysis using emerging parallel architectures.

[BibT_eX]

[DOI]

Ümit V. Çatalyürek

Antonio Ruiz

Manuel Ujaldon

Int. J. High Perform. Comput. Appl., 2011

Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2011

Power-aware Dense Linear Algebra Implementations on Multi-core and Many-core Processors.

[BibT_eX]

[DOI]

Proceedings of the 3rd Many-core Applications Research Community (MARC) Symposium. Proceedings of the 3rd MARC Symposium, 2011

2010

Extending OpenMP to Survive the Heterogeneous Multi-Core Era.

[BibT_eX]

[DOI]

Daniel Jiménez-González

Jesús Labarta

Int. J. Parallel Program., 2010

Retargeting PLAPACK to clusters with hardware accelerators.

[BibT_eX]

[DOI]

Manuel Fogué

Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010

2009

Out-of-core solution of linear systems on graphics processors.

[BibT_eX]

[DOI]

Rafael Rubio

Int. J. Parallel Emergent Distributed Syst., 2009

Exploiting the capabilities of modern GPUs for dense matrix computations.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2009

Solving dense linear systems on platforms with multiple hardware accelerators.

[BibT_eX]

[DOI]

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Reduction to Condensed Forms for Symmetric Eigenvalue Problems on Multi-core Architectures.

[BibT_eX]

[DOI]

Paolo Bientinesi

Daniel Kressner

Proceedings of the Parallel Processing and Applied Mathematics, 2009

Exploring the GPU for Enhancing Parallelism on Color and Texture Analysis.

[BibT_eX]

[DOI]

Ümit V. Çatalyürek

Antonio Ruiz

Manuel Ujaldon

Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures.

[BibT_eX]

[DOI]

Daniel Jiménez-González

Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009

Fast development of dense linear algebra codes on graphics processors.

[BibT_eX]

[DOI]

M. Jesús Zafont

Alberto F. Martín

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

An Efficient Implementation of GPU Virtualization in High Performance Clusters.

[BibT_eX]

[DOI]

Federico Silla

Proceedings of the Euro-Par 2009, 2009

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2009 Parallel Processing, 2009

2008

Attaining High Performance in General-Purpose Computations on Current Graphics Processors.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science, 2008

Evaluation and tuning of the Level 3 CUBLAS for graphics processors.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Biomedical image analysis on a cooperative cluster of GPUs and multicores.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Solving Dense Linear Systems on Graphics Processors.

[BibT_eX]

[DOI]